Music Genre Classification using Deep Learning¶
Context¶
Objective¶
Train a deep learning model to classify songs into different music genres (e.g., Rock, Jazz, Pop, Classical).
Dataset¶
Contains 1,000 audio tracks, each 30 seconds long
10 genres: Blues, Classical, Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae, and Rock
Each genre has 100 tracks
genres original - A collection of 10 genres with 100 audio files each, all having a length of 30 seconds (the famous GTZAN dataset, the MNIST of sounds)
images original - A visual representation for each audio file. One way to classify data is through neural networks. Because NNs (like CNN, what we will be using today) usually take in some sort of image representation, the audio files were converted to Mel Spectrograms to make this possible.
2 CSV files - Containing features of the audio files. One file has for each song (30 seconds long) a mean and variance computed over multiple features that can be extracted from an audio file. The other file has the same structure, but the songs were split before into 3 seconds audio files (this way increasing 10 times the amount of data we fuel into our classification models). With data, more is always better.
- features_30_sec.csv
- features_3_sec.csv
Post Change
- 01 blues
- 02 classical
- 03 country
- 04 disco
- 05 hiphop
- 06 jazz
- 07 metal
- 08 pop
- 09 reaggae
- 10 rock
Importing the necessary libraries and loading the data¶
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
!pip install librosa
Requirement already satisfied: librosa in /usr/local/lib/python3.11/dist-packages (0.10.2.post1) Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.11/dist-packages (from librosa) (3.0.1) Requirement already satisfied: numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.26.4) Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.13.1) Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.6.1) Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.4.2) Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (4.4.2) Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.60.0) Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.13.1) Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.8.2) Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.5.0.post1) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (4.12.2) Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.4) Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.1.0) Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from lazy-loader>=0.1->librosa) (24.2) Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba>=0.51.0->librosa) (0.43.0) Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.11/dist-packages (from pooch>=1.1->librosa) (4.3.6) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.11/dist-packages (from pooch>=1.1->librosa) (2.32.3) Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=0.20.0->librosa) (3.5.0) Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.11/dist-packages (from soundfile>=0.12.1->librosa) (1.17.1) Requirement already satisfied: pycparser in /usr/local/lib/python3.11/dist-packages (from cffi>=1.0->soundfile>=0.12.1->librosa) (2.22) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2025.1.31)
# For Audio Preprocessing
import librosa
import librosa.display as dsp
from IPython.display import Audio
# For Data Preprocessing
import pandas as pd
import numpy as np
import os
# For Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
#The data is provided as a zip file
import zipfile
import os
sns.set_style("dark") # This sets the style of the plots to "dark", meaning the background of the plots will have a dark theme.
Load the Dataset¶
# Import Zip Files
path = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/archive.zip'
#The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_ref:
zip_ref.extractall()
import os
directory_path = "/content/data"
# List all files and directories
files = os.listdir(directory_path)
print("Files and directories in '/content/data':")
for file in files:
print(file)
Files and directories in '/content/data': 07 02 09 08 06 04 01 03 10 05
# import os
# directory_path = "/content/data"
# # Walk through the directory
# for root, dirs, files in os.walk(directory_path):
# print(f"Directory: {root}")
# for file in files:
# print(f" - {file}")
Directory: /content/data Directory: /content/data/07 - 0_07_74.wav - 0_07_98.wav - 0_07_94.wav - 0_07_84.wav - 0_07_2.wav - 0_07_85.wav - 0_07_73.wav - 0_07_79.wav - 0_07_59.wav - 0_07_38.wav - 0_07_25.wav - 0_07_83.wav - 0_07_0.wav - 0_07_89.wav - 0_07_24.wav - 0_07_95.wav - 0_07_28.wav - 0_07_1.wav - 0_07_99.wav - 0_07_48.wav - 0_07_9.wav - 0_07_65.wav - 0_07_93.wav - 0_07_70.wav - 0_07_44.wav - 0_07_17.wav - 0_07_37.wav - 0_07_86.wav - 0_07_51.wav - 0_07_13.wav - 0_07_80.wav - 0_07_72.wav - 0_07_36.wav - 0_07_43.wav - 0_07_19.wav - 0_07_56.wav - 0_07_21.wav - 0_07_68.wav - 0_07_57.wav - 0_07_30.wav - 0_07_26.wav - 0_07_82.wav - 0_07_55.wav - 0_07_58.wav - 0_07_75.wav - 0_07_63.wav - 0_07_61.wav - 0_07_46.wav - 0_07_96.wav - 0_07_42.wav - 0_07_45.wav - 0_07_32.wav - 0_07_15.wav - 0_07_91.wav - 0_07_92.wav - 0_07_27.wav - 0_07_33.wav - 0_07_20.wav - 0_07_97.wav - 0_07_88.wav - 0_07_90.wav - 0_07_31.wav - 0_07_7.wav - 0_07_47.wav - 0_07_50.wav - 0_07_29.wav - 0_07_69.wav - 0_07_5.wav - 0_07_40.wav - 0_07_78.wav - 0_07_18.wav - 0_07_76.wav - 0_07_41.wav - 0_07_54.wav - 0_07_3.wav - 0_07_67.wav - 0_07_6.wav - 0_07_39.wav - 0_07_10.wav - 0_07_77.wav - 0_07_4.wav - 0_07_49.wav - 0_07_11.wav - 0_07_64.wav - 0_07_81.wav - 0_07_12.wav - 0_07_87.wav - 0_07_35.wav - 0_07_60.wav - 0_07_16.wav - 0_07_23.wav - 0_07_53.wav - 0_07_62.wav - 0_07_34.wav - 0_07_71.wav - 0_07_8.wav - 0_07_14.wav - 0_07_22.wav - 0_07_52.wav - 0_07_66.wav Directory: /content/data/02 - 0_02_71.wav - 0_02_2.wav - 0_02_73.wav - 0_02_12.wav - 0_02_69.wav - 0_02_92.wav - 0_02_60.wav - 0_02_24.wav - 0_02_5.wav - 0_02_74.wav - 0_02_67.wav - 0_02_21.wav - 0_02_51.wav - 0_02_97.wav - 0_02_96.wav - 0_02_89.wav - 0_02_26.wav - 0_02_7.wav - 0_02_22.wav - 0_02_30.wav - 0_02_91.wav - 0_02_65.wav - 0_02_81.wav - 0_02_13.wav - 0_02_83.wav - 0_02_11.wav - 0_02_79.wav - 0_02_63.wav - 0_02_53.wav - 0_02_34.wav - 0_02_85.wav - 0_02_35.wav - 0_02_32.wav - 0_02_61.wav - 0_02_93.wav - 0_02_94.wav - 0_02_98.wav - 0_02_25.wav - 0_02_18.wav - 0_02_31.wav - 0_02_6.wav - 0_02_19.wav - 0_02_37.wav - 0_02_4.wav - 0_02_49.wav - 0_02_23.wav - 0_02_72.wav - 0_02_45.wav - 0_02_57.wav - 0_02_9.wav - 0_02_29.wav - 0_02_55.wav - 0_02_64.wav - 0_02_47.wav - 0_02_77.wav - 0_02_44.wav - 0_02_86.wav - 0_02_76.wav - 0_02_82.wav - 0_02_78.wav - 0_02_36.wav - 0_02_28.wav - 0_02_20.wav - 0_02_27.wav - 0_02_43.wav - 0_02_15.wav - 0_02_39.wav - 0_02_14.wav - 0_02_42.wav - 0_02_99.wav - 0_02_1.wav - 0_02_3.wav - 0_02_0.wav - 0_02_54.wav - 0_02_80.wav - 0_02_40.wav - 0_02_62.wav - 0_02_58.wav - 0_02_59.wav - 0_02_17.wav - 0_02_90.wav - 0_02_75.wav - 0_02_68.wav - 0_02_46.wav - 0_02_70.wav - 0_02_48.wav - 0_02_66.wav - 0_02_84.wav - 0_02_41.wav - 0_02_52.wav - 0_02_33.wav - 0_02_38.wav - 0_02_56.wav - 0_02_8.wav - 0_02_88.wav - 0_02_87.wav - 0_02_16.wav - 0_02_10.wav - 0_02_50.wav - 0_02_95.wav Directory: /content/data/09 - 0_09_18.wav - 0_09_72.wav - 0_09_53.wav - 0_09_40.wav - 0_09_22.wav - 0_09_50.wav - 0_09_36.wav - 0_09_9.wav - 0_09_93.wav - 0_09_8.wav - 0_09_29.wav - 0_09_98.wav - 0_09_89.wav - 0_09_28.wav - 0_09_68.wav - 0_09_42.wav - 0_09_62.wav - 0_09_25.wav - 0_09_84.wav - 0_09_49.wav - 0_09_16.wav - 0_09_75.wav - 0_09_6.wav - 0_09_48.wav - 0_09_1.wav - 0_09_30.wav - 0_09_31.wav - 0_09_44.wav - 0_09_27.wav - 0_09_65.wav - 0_09_76.wav - 0_09_0.wav - 0_09_88.wav - 0_09_35.wav - 0_09_73.wav - 0_09_19.wav - 0_09_15.wav - 0_09_39.wav - 0_09_99.wav - 0_09_58.wav - 0_09_70.wav - 0_09_41.wav - 0_09_71.wav - 0_09_5.wav - 0_09_13.wav - 0_09_60.wav - 0_09_57.wav - 0_09_23.wav - 0_09_21.wav - 0_09_82.wav - 0_09_91.wav - 0_09_87.wav - 0_09_4.wav - 0_09_67.wav - 0_09_94.wav - 0_09_11.wav - 0_09_56.wav - 0_09_63.wav - 0_09_10.wav - 0_09_34.wav - 0_09_55.wav - 0_09_45.wav - 0_09_43.wav - 0_09_20.wav - 0_09_24.wav - 0_09_74.wav - 0_09_47.wav - 0_09_78.wav - 0_09_54.wav - 0_09_2.wav - 0_09_86.wav - 0_09_37.wav - 0_09_90.wav - 0_09_38.wav - 0_09_69.wav - 0_09_26.wav - 0_09_96.wav - 0_09_17.wav - 0_09_12.wav - 0_09_92.wav - 0_09_61.wav - 0_09_7.wav - 0_09_85.wav - 0_09_52.wav - 0_09_95.wav - 0_09_97.wav - 0_09_46.wav - 0_09_59.wav - 0_09_14.wav - 0_09_77.wav - 0_09_66.wav - 0_09_33.wav - 0_09_51.wav - 0_09_81.wav - 0_09_3.wav - 0_09_79.wav - 0_09_83.wav - 0_09_64.wav - 0_09_32.wav - 0_09_80.wav Directory: /content/data/08 - 0_08_3.wav - 0_08_93.wav - 0_08_59.wav - 0_08_86.wav - 0_08_92.wav - 0_08_16.wav - 0_08_77.wav - 0_08_71.wav - 0_08_17.wav - 0_08_91.wav - 0_08_62.wav - 0_08_29.wav - 0_08_58.wav - 0_08_37.wav - 0_08_66.wav - 0_08_22.wav - 0_08_69.wav - 0_08_89.wav - 0_08_97.wav - 0_08_79.wav - 0_08_64.wav - 0_08_10.wav - 0_08_27.wav - 0_08_84.wav - 0_08_76.wav - 0_08_51.wav - 0_08_5.wav - 0_08_83.wav - 0_08_9.wav - 0_08_19.wav - 0_08_8.wav - 0_08_73.wav - 0_08_61.wav - 0_08_40.wav - 0_08_1.wav - 0_08_4.wav - 0_08_39.wav - 0_08_82.wav - 0_08_11.wav - 0_08_36.wav - 0_08_33.wav - 0_08_54.wav - 0_08_45.wav - 0_08_35.wav - 0_08_2.wav - 0_08_74.wav - 0_08_67.wav - 0_08_47.wav - 0_08_12.wav - 0_08_94.wav - 0_08_85.wav - 0_08_90.wav - 0_08_43.wav - 0_08_87.wav - 0_08_75.wav - 0_08_65.wav - 0_08_26.wav - 0_08_38.wav - 0_08_15.wav - 0_08_41.wav - 0_08_14.wav - 0_08_56.wav - 0_08_23.wav - 0_08_70.wav - 0_08_88.wav - 0_08_44.wav - 0_08_13.wav - 0_08_31.wav - 0_08_55.wav - 0_08_21.wav - 0_08_50.wav - 0_08_63.wav - 0_08_32.wav - 0_08_96.wav - 0_08_72.wav - 0_08_57.wav - 0_08_0.wav - 0_08_20.wav - 0_08_30.wav - 0_08_95.wav - 0_08_28.wav - 0_08_99.wav - 0_08_18.wav - 0_08_53.wav - 0_08_49.wav - 0_08_78.wav - 0_08_7.wav - 0_08_52.wav - 0_08_98.wav - 0_08_6.wav - 0_08_24.wav - 0_08_68.wav - 0_08_80.wav - 0_08_34.wav - 0_08_42.wav - 0_08_25.wav - 0_08_81.wav - 0_08_46.wav - 0_08_60.wav - 0_08_48.wav Directory: /content/data/06 - 0_06_63.wav - 0_06_46.wav - 0_06_84.wav - 0_06_13.wav - 0_06_58.wav - 0_06_73.wav - 0_06_29.wav - 0_06_56.wav - 0_06_87.wav - 0_06_19.wav - 0_06_47.wav - 0_06_20.wav - 0_06_91.wav - 0_06_66.wav - 0_06_62.wav - 0_06_45.wav - 0_06_44.wav - 0_06_79.wav - 0_06_61.wav - 0_06_83.wav - 0_06_3.wav - 0_06_92.wav - 0_06_76.wav - 0_06_64.wav - 0_06_48.wav - 0_06_50.wav - 0_06_42.wav - 0_06_7.wav - 0_06_99.wav - 0_06_17.wav - 0_06_89.wav - 0_06_24.wav - 0_06_74.wav - 0_06_9.wav - 0_06_2.wav - 0_06_54.wav - 0_06_1.wav - 0_06_27.wav - 0_06_35.wav - 0_06_34.wav - 0_06_53.wav - 0_06_70.wav - 0_06_51.wav - 0_06_15.wav - 0_06_68.wav - 0_06_65.wav - 0_06_86.wav - 0_06_37.wav - 0_06_40.wav - 0_06_71.wav - 0_06_78.wav - 0_06_75.wav - 0_06_69.wav - 0_06_8.wav - 0_06_11.wav - 0_06_25.wav - 0_06_31.wav - 0_06_0.wav - 0_06_67.wav - 0_06_33.wav - 0_06_16.wav - 0_06_96.wav - 0_06_60.wav - 0_06_94.wav - 0_06_41.wav - 0_06_43.wav - 0_06_32.wav - 0_06_10.wav - 0_06_26.wav - 0_06_59.wav - 0_06_72.wav - 0_06_22.wav - 0_06_21.wav - 0_06_82.wav - 0_06_30.wav - 0_06_38.wav - 0_06_28.wav - 0_06_52.wav - 0_06_55.wav - 0_06_36.wav - 0_06_93.wav - 0_06_88.wav - 0_06_57.wav - 0_06_39.wav - 0_06_14.wav - 0_06_6.wav - 0_06_4.wav - 0_06_18.wav - 0_06_80.wav - 0_06_49.wav - 0_06_5.wav - 0_06_23.wav - 0_06_98.wav - 0_06_95.wav - 0_06_77.wav - 0_06_90.wav - 0_06_85.wav - 0_06_12.wav - 0_06_97.wav - 0_06_81.wav Directory: /content/data/04 - 0_04_71.wav - 0_04_0.wav - 0_04_81.wav - 0_04_7.wav - 0_04_1.wav - 0_04_62.wav - 0_04_89.wav - 0_04_36.wav - 0_04_98.wav - 0_04_59.wav - 0_04_29.wav - 0_04_92.wav - 0_04_69.wav - 0_04_3.wav - 0_04_95.wav - 0_04_67.wav - 0_04_82.wav - 0_04_42.wav - 0_04_19.wav - 0_04_54.wav - 0_04_66.wav - 0_04_40.wav - 0_04_17.wav - 0_04_9.wav - 0_04_21.wav - 0_04_34.wav - 0_04_57.wav - 0_04_61.wav - 0_04_99.wav - 0_04_13.wav - 0_04_74.wav - 0_04_33.wav - 0_04_51.wav - 0_04_91.wav - 0_04_25.wav - 0_04_5.wav - 0_04_6.wav - 0_04_26.wav - 0_04_14.wav - 0_04_48.wav - 0_04_49.wav - 0_04_93.wav - 0_04_84.wav - 0_04_28.wav - 0_04_64.wav - 0_04_75.wav - 0_04_85.wav - 0_04_58.wav - 0_04_20.wav - 0_04_2.wav - 0_04_46.wav - 0_04_31.wav - 0_04_52.wav - 0_04_12.wav - 0_04_65.wav - 0_04_41.wav - 0_04_88.wav - 0_04_56.wav - 0_04_16.wav - 0_04_8.wav - 0_04_10.wav - 0_04_50.wav - 0_04_39.wav - 0_04_24.wav - 0_04_30.wav - 0_04_35.wav - 0_04_43.wav - 0_04_60.wav - 0_04_45.wav - 0_04_32.wav - 0_04_79.wav - 0_04_68.wav - 0_04_97.wav - 0_04_78.wav - 0_04_37.wav - 0_04_22.wav - 0_04_70.wav - 0_04_18.wav - 0_04_87.wav - 0_04_44.wav - 0_04_4.wav - 0_04_53.wav - 0_04_76.wav - 0_04_47.wav - 0_04_72.wav - 0_04_80.wav - 0_04_11.wav - 0_04_90.wav - 0_04_38.wav - 0_04_83.wav - 0_04_77.wav - 0_04_63.wav - 0_04_55.wav - 0_04_96.wav - 0_04_86.wav - 0_04_94.wav - 0_04_73.wav - 0_04_27.wav - 0_04_23.wav - 0_04_15.wav Directory: /content/data/01 - 0_01_32.wav - 0_01_41.wav - 0_01_42.wav - 0_01_58.wav - 0_01_93.wav - 0_01_22.wav - 0_01_27.wav - 0_01_84.wav - 0_01_55.wav - 0_01_69.wav - 0_01_34.wav - 0_01_18.wav - 0_01_65.wav - 0_01_26.wav - 0_01_17.wav - 0_01_87.wav - 0_01_7.wav - 0_01_70.wav - 0_01_97.wav - 0_01_91.wav - 0_01_94.wav - 0_01_72.wav - 0_01_82.wav - 0_01_76.wav - 0_01_35.wav - 0_01_53.wav - 0_01_44.wav - 0_01_24.wav - 0_01_52.wav - 0_01_36.wav - 0_01_98.wav - 0_01_29.wav - 0_01_30.wav - 0_01_62.wav - 0_01_15.wav - 0_01_54.wav - 0_01_85.wav - 0_01_66.wav - 0_01_88.wav - 0_01_40.wav - 0_01_8.wav - 0_01_79.wav - 0_01_5.wav - 0_01_81.wav - 0_01_92.wav - 0_01_1.wav - 0_01_21.wav - 0_01_28.wav - 0_01_3.wav - 0_01_45.wav - 0_01_46.wav - 0_01_33.wav - 0_01_89.wav - 0_01_83.wav - 0_01_78.wav - 0_01_0.wav - 0_01_9.wav - 0_01_10.wav - 0_01_23.wav - 0_01_60.wav - 0_01_43.wav - 0_01_64.wav - 0_01_4.wav - 0_01_2.wav - 0_01_63.wav - 0_01_99.wav - 0_01_68.wav - 0_01_20.wav - 0_01_38.wav - 0_01_14.wav - 0_01_37.wav - 0_01_48.wav - 0_01_77.wav - 0_01_96.wav - 0_01_31.wav - 0_01_67.wav - 0_01_11.wav - 0_01_59.wav - 0_01_6.wav - 0_01_73.wav - 0_01_25.wav - 0_01_47.wav - 0_01_95.wav - 0_01_74.wav - 0_01_86.wav - 0_01_57.wav - 0_01_13.wav - 0_01_39.wav - 0_01_12.wav - 0_01_50.wav - 0_01_61.wav - 0_01_75.wav - 0_01_90.wav - 0_01_49.wav - 0_01_16.wav - 0_01_56.wav - 0_01_80.wav - 0_01_19.wav - 0_01_71.wav - 0_01_51.wav Directory: /content/data/03 - 0_03_71.wav - 0_03_85.wav - 0_03_63.wav - 0_03_68.wav - 0_03_12.wav - 0_03_80.wav - 0_03_61.wav - 0_03_50.wav - 0_03_44.wav - 0_03_66.wav - 0_03_7.wav - 0_03_51.wav - 0_03_36.wav - 0_03_26.wav - 0_03_14.wav - 0_03_79.wav - 0_03_98.wav - 0_03_52.wav - 0_03_49.wav - 0_03_4.wav - 0_03_21.wav - 0_03_5.wav - 0_03_32.wav - 0_03_55.wav - 0_03_62.wav - 0_03_6.wav - 0_03_96.wav - 0_03_20.wav - 0_03_11.wav - 0_03_47.wav - 0_03_89.wav - 0_03_54.wav - 0_03_29.wav - 0_03_90.wav - 0_03_81.wav - 0_03_69.wav - 0_03_83.wav - 0_03_59.wav - 0_03_17.wav - 0_03_34.wav - 0_03_0.wav - 0_03_76.wav - 0_03_30.wav - 0_03_75.wav - 0_03_1.wav - 0_03_18.wav - 0_03_99.wav - 0_03_58.wav - 0_03_2.wav - 0_03_57.wav - 0_03_92.wav - 0_03_24.wav - 0_03_93.wav - 0_03_73.wav - 0_03_25.wav - 0_03_91.wav - 0_03_27.wav - 0_03_10.wav - 0_03_77.wav - 0_03_78.wav - 0_03_31.wav - 0_03_72.wav - 0_03_48.wav - 0_03_3.wav - 0_03_28.wav - 0_03_67.wav - 0_03_65.wav - 0_03_22.wav - 0_03_16.wav - 0_03_9.wav - 0_03_45.wav - 0_03_60.wav - 0_03_40.wav - 0_03_94.wav - 0_03_97.wav - 0_03_82.wav - 0_03_46.wav - 0_03_74.wav - 0_03_95.wav - 0_03_87.wav - 0_03_15.wav - 0_03_53.wav - 0_03_19.wav - 0_03_38.wav - 0_03_8.wav - 0_03_43.wav - 0_03_39.wav - 0_03_35.wav - 0_03_23.wav - 0_03_84.wav - 0_03_13.wav - 0_03_86.wav - 0_03_70.wav - 0_03_41.wav - 0_03_88.wav - 0_03_64.wav - 0_03_42.wav - 0_03_56.wav - 0_03_33.wav - 0_03_37.wav Directory: /content/data/10 - 0_10_98.wav - 0_10_0.wav - 0_10_54.wav - 0_10_87.wav - 0_10_95.wav - 0_10_5.wav - 0_10_71.wav - 0_10_85.wav - 0_10_53.wav - 0_10_19.wav - 0_10_33.wav - 0_10_58.wav - 0_10_55.wav - 0_10_21.wav - 0_10_52.wav - 0_10_81.wav - 0_10_4.wav - 0_10_6.wav - 0_10_89.wav - 0_10_23.wav - 0_10_10.wav - 0_10_51.wav - 0_10_46.wav - 0_10_66.wav - 0_10_28.wav - 0_10_32.wav - 0_10_79.wav - 0_10_14.wav - 0_10_92.wav - 0_10_60.wav - 0_10_59.wav - 0_10_61.wav - 0_10_56.wav - 0_10_16.wav - 0_10_8.wav - 0_10_3.wav - 0_10_50.wav - 0_10_62.wav - 0_10_69.wav - 0_10_34.wav - 0_10_73.wav - 0_10_77.wav - 0_10_82.wav - 0_10_2.wav - 0_10_11.wav - 0_10_41.wav - 0_10_88.wav - 0_10_63.wav - 0_10_26.wav - 0_10_37.wav - 0_10_86.wav - 0_10_24.wav - 0_10_76.wav - 0_10_57.wav - 0_10_93.wav - 0_10_1.wav - 0_10_74.wav - 0_10_75.wav - 0_10_22.wav - 0_10_18.wav - 0_10_38.wav - 0_10_90.wav - 0_10_44.wav - 0_10_78.wav - 0_10_43.wav - 0_10_72.wav - 0_10_67.wav - 0_10_94.wav - 0_10_27.wav - 0_10_83.wav - 0_10_36.wav - 0_10_17.wav - 0_10_40.wav - 0_10_20.wav - 0_10_30.wav - 0_10_13.wav - 0_10_49.wav - 0_10_65.wav - 0_10_29.wav - 0_10_12.wav - 0_10_45.wav - 0_10_7.wav - 0_10_97.wav - 0_10_31.wav - 0_10_42.wav - 0_10_15.wav - 0_10_39.wav - 0_10_64.wav - 0_10_35.wav - 0_10_70.wav - 0_10_48.wav - 0_10_84.wav - 0_10_9.wav - 0_10_25.wav - 0_10_47.wav - 0_10_91.wav - 0_10_80.wav - 0_10_68.wav - 0_10_96.wav - 0_10_99.wav Directory: /content/data/05 - 0_05_98.wav - 0_05_52.wav - 0_05_36.wav - 0_05_1.wav - 0_05_83.wav - 0_05_10.wav - 0_05_81.wav - 0_05_19.wav - 0_05_34.wav - 0_05_80.wav - 0_05_30.wav - 0_05_22.wav - 0_05_77.wav - 0_05_76.wav - 0_05_17.wav - 0_05_27.wav - 0_05_74.wav - 0_05_3.wav - 0_05_44.wav - 0_05_69.wav - 0_05_21.wav - 0_05_28.wav - 0_05_35.wav - 0_05_73.wav - 0_05_61.wav - 0_05_86.wav - 0_05_2.wav - 0_05_8.wav - 0_05_12.wav - 0_05_5.wav - 0_05_78.wav - 0_05_25.wav - 0_05_42.wav - 0_05_99.wav - 0_05_15.wav - 0_05_92.wav - 0_05_63.wav - 0_05_85.wav - 0_05_82.wav - 0_05_48.wav - 0_05_4.wav - 0_05_20.wav - 0_05_94.wav - 0_05_58.wav - 0_05_43.wav - 0_05_96.wav - 0_05_66.wav - 0_05_54.wav - 0_05_57.wav - 0_05_87.wav - 0_05_53.wav - 0_05_90.wav - 0_05_9.wav - 0_05_47.wav - 0_05_68.wav - 0_05_18.wav - 0_05_62.wav - 0_05_55.wav - 0_05_70.wav - 0_05_91.wav - 0_05_59.wav - 0_05_24.wav - 0_05_32.wav - 0_05_65.wav - 0_05_95.wav - 0_05_6.wav - 0_05_16.wav - 0_05_0.wav - 0_05_13.wav - 0_05_75.wav - 0_05_84.wav - 0_05_97.wav - 0_05_88.wav - 0_05_60.wav - 0_05_23.wav - 0_05_14.wav - 0_05_26.wav - 0_05_64.wav - 0_05_51.wav - 0_05_29.wav - 0_05_38.wav - 0_05_40.wav - 0_05_79.wav - 0_05_71.wav - 0_05_67.wav - 0_05_39.wav - 0_05_45.wav - 0_05_33.wav - 0_05_37.wav - 0_05_7.wav - 0_05_72.wav - 0_05_89.wav - 0_05_56.wav - 0_05_11.wav - 0_05_41.wav - 0_05_93.wav - 0_05_46.wav - 0_05_49.wav - 0_05_31.wav - 0_05_50.wav
Extract
# import zipfile
# import os
# zip_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/archive.zip"
# extract_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification"
# # Extract if not already extracted
# if not os.path.exists(os.path.join(extract_path, "data")):
# with zipfile.ZipFile(zip_path, 'r') as zip_ref:
# zip_ref.extractall(extract_path)
# print("✅ Extraction Complete!")
# else:
# print("⚠️ Files already extracted.")
Verify Extraction
# import os
# print("📂 Extracted Folders:", os.listdir(extract_path))
Audio Samples of spoken digits (0-9) of 50 different speakers.
functions used to create the get_audio() function
- .wav: .wav is a file format like .csv which stores the raw audio format. We will load the .wav file using the librosa package.
- dsp.waveshow(): It visualizes the waveform in the time domain. This method creates a plot that alternates between a raw samples-based view of the signal and an amplitude-envelope view of the signal. The "sr" parameter is the sampling rate, i.e., samples per second.
- Audio(): From the Ipython package, we can create an audio object.
Audio Samples of spoken digits (0-9) of 50 different speakers.
functions used to create the get_audio() function
- .wav: .wav is a file format like .csv which stores the raw audio format. We will load the .wav file using the librosa package.
- dsp.waveshow(): It visualizes the waveform in the time domain. This method creates a plot that alternates between a raw samples-based view of the signal and an amplitude-envelope view of the signal. The "sr" parameter is the sampling rate, i.e., samples per second.
- Audio(): From the Ipython package, we can create an audio object.
def get_audio(digit=0):
root_dir = "/content/data/"
available_folders = [f for f in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, f))]
if not available_folders:
print(f"⚠️ No folders found in {root_dir}")
return None
# Pick a random folder
sample_folder = np.random.choice(available_folders)
folder_path = os.path.join(root_dir, sample_folder)
# List available files for the chosen digit
available_files = [f for f in os.listdir(folder_path) if f.split("_")[2].startswith(str(digit))]
if not available_files:
print(f"⚠️ No files found for digit {digit} in {folder_path}")
return None
# Pick a random file
file_name = np.random.choice(available_files)
file_path = os.path.join(folder_path, file_name)
# Load and display audio
data, sample_rate = librosa.load(file_path, sr=22050)
librosa.display.waveshow(data, sr=sample_rate)
plt.show()
#return dsp.Audio(data=data, rate=sample_rate)
import IPython.display as ipd
return ipd.Audio(data, rate=sample_rate)
# Show the audio and plot of digit 0
get_audio(0)
# Show the audio and plot of digit 1
get_audio(1)
# Show the audio and plot of digit 2
get_audio(2)
# Show the audio and plot of digit 9
get_audio(9)
Visualizing the spectrogram of the audio data¶
- A spectrogram is a visual way of representing the signal strength or “loudness” of a signal over time at various frequencies or time steps present in a particular waveform. A spectrogram gives a detailed view of audio. It represents amplitude, frequency, and time in a single plot. Since spectrograms are continuous plots, they can be interpreted as an image. Different spectrograms have different attributes on their axes and they are usually different to interpret. In a Research and Development scenario, we make use of a vocoder, which is an encoder that converts spectrograms back to audio using parameters learned by machine learning. One great vocoder is the WaveNet vocoder which is used in almost all Text to Speech architectures.
def get_audio_raw(digit=0):
root_dir = "/content/data/"
# Get all available folders in /content/data/
available_folders = sorted(os.listdir(root_dir))
# Pick a random folder
sample_folder = np.random.choice(available_folders)
folder_path = os.path.join(root_dir, sample_folder)
# Get all files in the selected folder
available_files = [f for f in os.listdir(folder_path) if f.endswith('.wav')]
# Filter files that match the digit (third element in the filename)
digit_files = [f for f in available_files if f.split("_")[2].startswith(str(digit))]
if not digit_files:
print(f"⚠️ No files found for digit {digit} in {folder_path}")
return None, None # Return None to avoid errors
# Pick a random file
file_name = np.random.choice(digit_files)
file_path = os.path.join(folder_path, file_name)
# Load audio only if the file exists
if not os.path.exists(file_path):
print(f"⚠️ File not found: {file_path}")
return None, None # Avoid loading a missing file
# Load audio
audio, sample_rate = librosa.load(file_path, sr=22050)
return audio, sample_rate
Extracting features from the audio file
Mel-frequency cepstral coefficients (MFCCs) Feature Extraction
MFCCs are usually the final features used in many machine learning models trained on audio data. They are usually a set of mel coefficients defined for each time step through which the raw audio data can be encoded. So for example, if we have an audio sample extending for 30 time steps, and we are defining each time step by 40 Mel Coefficients, our entire sample can be represented by 40 * 30 Mel Coefficients. And if we want to create a Mel Spectrogram out of it, our spectrogram will resemble a 2-D array of 40 horizontal rows and 30 vertical columns.
In this time step, we will first extract the Mel Coefficents for each audio file and add them to our dataset.
- extract_features : Returns the MFCC extracted features for an audio file.
- process_and_create_dataset : Iterate through the audio of each digit, extract the features using the extract_features() function, and append the data into a DataFrame.
Creating a function that extracts the data from audio files
# Function to extract MFCC features from an audio file
def extract_features(file):
try:
# Load audio and its sample rate
audio, sample_rate = librosa.load(file, sr=22050)
# Extract MFCC features
extracted_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
# Scale the extracted features (Mean pooling)
extracted_features = np.mean(extracted_features.T, axis=0)
return extracted_features
except Exception as e:
print(f"⚠️ Error processing {file}: {e}")
return None # Return None if error occurs
# Function to preprocess the dataset
def preprocess_and_create_dataset():
root_folder_path = "/content/data/"
dataset = []
# Iterate through folders (01-10)
for folder in tqdm(range(1, 11)):
print(f'\nProcessing folder: {folder}')
# Ensure folder names are formatted as '01', '02', ..., '10'
folder_name = f"{folder:02d}"
folder_path = os.path.join(root_folder_path, folder_name)
# Ensure folder exists
if not os.path.exists(folder_path):
print(f"⚠️ Skipping missing folder: {folder_path}")
continue
# Iterate through files in the folder
for file in tqdm(os.listdir(folder_path)):
abs_file_path = os.path.join(folder_path, file)
# Extract features
extracted_features = extract_features(abs_file_path)
# Skip if feature extraction failed
if extracted_features is None:
continue
# Extract class label (corrected)
try:
class_label = int(file.split("_")[1]) # Extracts second element (digit class)
except ValueError:
print(f"⚠️ Skipping file {file} due to incorrect format.")
continue
# Append to dataset
dataset.append([extracted_features, class_label])
# Convert dataset to DataFrame
df = pd.DataFrame(dataset, columns=['features', 'class'])
# Convert 'features' column to a NumPy array for efficiency
df['features'] = df['features'].apply(lambda x: np.array(x))
return df
Create the dataset using the defined function
preprocess_and_create_dataset:" step by step for a single file
# # Set folder number
# root_folder_path = "/content/data/"
# folder = os.path.join(root_folder_path, "0" + str(1))
# folder
# # Set path
# file = os.listdir(folder)[0]
# abs_file_path = os.path.join(folder, file)
# abs_file_path
# # Extract features using mel-frequency coefficient
# audio, sample_rate = librosa.load(abs_file_path)
# extracted_features = librosa.feature.mfcc(y = audio, sr = sample_rate, n_mfcc = 40)
# audio.shape, sample_rate
# print(f'This audio file last {audio.shape[0]/sample_rate} seconds')
# extracted_features.shape
# # n_mfcc = 40)
# # n_frames: column
# # Increase the printed number of columns.
# np.set_printoptions(linewidth=150)
# # Scale the extracted features
# extracted_features = np.mean(extracted_features.T, axis = 0)
# np.set_printoptions(linewidth=100)
# extracted_features
# extracted_features.shape
# # Class label
# class_label = file[0]
# class_label
# dataset = []
# # Append a list where the feature represents a column and class of the digit represents another column
# dataset.append([extracted_features, class_label])
# dataset[0][0]
# dataset[0][1]
# %%time
# # Create the dataset by calling the function
# dataset = preprocess_and_create_dataset()
dataset = preprocess_and_create_dataset()
print(dataset.head())
print(dataset['class'].value_counts()) # Check if multiple classes are detected correctly
0%| | 0/10 [00:00<?, ?it/s]
Processing folder: 1
0%| | 0/100 [00:00<?, ?it/s] 1%| | 1/100 [00:02<03:34, 2.16s/it] 3%|▎ | 3/100 [00:02<00:59, 1.62it/s] 5%|▌ | 5/100 [00:02<00:31, 2.97it/s] 7%|▋ | 7/100 [00:02<00:20, 4.43it/s] 9%|▉ | 9/100 [00:02<00:15, 6.02it/s] 11%|█ | 11/100 [00:02<00:11, 7.55it/s] 13%|█▎ | 13/100 [00:03<00:09, 8.96it/s] 15%|█▌ | 15/100 [00:03<00:08, 9.82it/s] 17%|█▋ | 17/100 [00:03<00:07, 11.00it/s] 19%|█▉ | 19/100 [00:03<00:06, 11.67it/s] 21%|██ | 21/100 [00:03<00:06, 12.46it/s] 23%|██▎ | 23/100 [00:03<00:05, 13.03it/s] 25%|██▌ | 25/100 [00:03<00:05, 12.81it/s] 27%|██▋ | 27/100 [00:04<00:05, 13.37it/s] 29%|██▉ | 29/100 [00:04<00:05, 12.29it/s] 31%|███ | 31/100 [00:04<00:05, 12.90it/s] 33%|███▎ | 33/100 [00:04<00:05, 13.27it/s] 35%|███▌ | 35/100 [00:04<00:04, 13.68it/s] 37%|███▋ | 37/100 [00:04<00:04, 14.05it/s] 39%|███▉ | 39/100 [00:04<00:04, 12.93it/s] 41%|████ | 41/100 [00:05<00:04, 13.61it/s] 43%|████▎ | 43/100 [00:05<00:04, 13.26it/s] 45%|████▌ | 45/100 [00:05<00:03, 13.83it/s] 47%|████▋ | 47/100 [00:05<00:03, 14.12it/s] 49%|████▉ | 49/100 [00:05<00:03, 14.19it/s] 51%|█████ | 51/100 [00:05<00:03, 14.39it/s] 53%|█████▎ | 53/100 [00:05<00:03, 14.12it/s] 55%|█████▌ | 55/100 [00:06<00:03, 14.30it/s] 57%|█████▋ | 57/100 [00:06<00:02, 14.37it/s] 59%|█████▉ | 59/100 [00:06<00:03, 13.65it/s] 61%|██████ | 61/100 [00:06<00:02, 13.88it/s] 63%|██████▎ | 63/100 [00:06<00:02, 13.97it/s] 65%|██████▌ | 65/100 [00:06<00:02, 14.01it/s] 67%|██████▋ | 67/100 [00:06<00:02, 13.96it/s] 69%|██████▉ | 69/100 [00:07<00:02, 14.00it/s] 71%|███████ | 71/100 [00:07<00:02, 14.22it/s] 73%|███████▎ | 73/100 [00:07<00:01, 13.56it/s] 75%|███████▌ | 75/100 [00:07<00:01, 13.80it/s] 77%|███████▋ | 77/100 [00:07<00:01, 14.04it/s] 79%|███████▉ | 79/100 [00:07<00:01, 14.31it/s] 81%|████████ | 81/100 [00:07<00:01, 13.67it/s] 83%|████████▎ | 83/100 [00:08<00:01, 14.11it/s] 85%|████████▌ | 85/100 [00:08<00:01, 14.41it/s] 87%|████████▋ | 87/100 [00:08<00:00, 14.55it/s] 89%|████████▉ | 89/100 [00:08<00:00, 14.23it/s] 91%|█████████ | 91/100 [00:08<00:00, 14.22it/s] 93%|█████████▎| 93/100 [00:08<00:00, 14.13it/s] 95%|█████████▌| 95/100 [00:08<00:00, 13.88it/s] 97%|█████████▋| 97/100 [00:09<00:00, 14.26it/s] 100%|██████████| 100/100 [00:09<00:00, 10.79it/s] 10%|█ | 1/10 [00:09<01:23, 9.27s/it]
Processing folder: 2
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:06, 14.44it/s] 4%|▍ | 4/100 [00:00<00:06, 14.43it/s] 6%|▌ | 6/100 [00:00<00:06, 14.30it/s] 8%|▊ | 8/100 [00:00<00:07, 12.91it/s] 10%|█ | 10/100 [00:00<00:06, 13.26it/s] 12%|█▏ | 12/100 [00:00<00:06, 13.89it/s] 14%|█▍ | 14/100 [00:01<00:05, 14.34it/s] 16%|█▌ | 16/100 [00:01<00:05, 14.53it/s] 18%|█▊ | 18/100 [00:01<00:05, 14.32it/s] 20%|██ | 20/100 [00:01<00:05, 14.14it/s] 22%|██▏ | 22/100 [00:01<00:05, 14.12it/s] 24%|██▍ | 24/100 [00:01<00:05, 13.91it/s] 26%|██▌ | 26/100 [00:01<00:05, 14.11it/s] 28%|██▊ | 28/100 [00:02<00:05, 13.07it/s] 30%|███ | 30/100 [00:02<00:06, 11.30it/s] 32%|███▏ | 32/100 [00:02<00:06, 11.05it/s] 34%|███▍ | 34/100 [00:02<00:06, 10.32it/s] 36%|███▌ | 36/100 [00:02<00:06, 9.26it/s] 37%|███▋ | 37/100 [00:03<00:07, 8.79it/s] 38%|███▊ | 38/100 [00:03<00:07, 8.29it/s] 39%|███▉ | 39/100 [00:03<00:07, 8.04it/s] 40%|████ | 40/100 [00:03<00:07, 7.75it/s] 41%|████ | 41/100 [00:03<00:07, 7.54it/s] 42%|████▏ | 42/100 [00:03<00:07, 7.92it/s] 43%|████▎ | 43/100 [00:03<00:06, 8.18it/s] 45%|████▌ | 45/100 [00:04<00:06, 9.12it/s] 46%|████▌ | 46/100 [00:04<00:06, 8.93it/s] 48%|████▊ | 48/100 [00:04<00:05, 9.03it/s] 49%|████▉ | 49/100 [00:04<00:05, 8.91it/s] 50%|█████ | 50/100 [00:04<00:05, 8.65it/s] 52%|█████▏ | 52/100 [00:04<00:05, 8.57it/s] 54%|█████▍ | 54/100 [00:05<00:05, 8.86it/s] 55%|█████▌ | 55/100 [00:05<00:05, 8.33it/s] 56%|█████▌ | 56/100 [00:05<00:05, 7.93it/s] 57%|█████▋ | 57/100 [00:05<00:05, 7.54it/s] 59%|█████▉ | 59/100 [00:05<00:04, 8.65it/s] 60%|██████ | 60/100 [00:05<00:04, 8.24it/s] 61%|██████ | 61/100 [00:05<00:04, 8.23it/s] 62%|██████▏ | 62/100 [00:06<00:04, 8.08it/s] 64%|██████▍ | 64/100 [00:06<00:04, 8.53it/s] 65%|██████▌ | 65/100 [00:06<00:04, 8.12it/s] 66%|██████▌ | 66/100 [00:06<00:04, 7.84it/s] 67%|██████▋ | 67/100 [00:06<00:04, 8.20it/s] 69%|██████▉ | 69/100 [00:06<00:03, 8.16it/s] 70%|███████ | 70/100 [00:07<00:03, 7.78it/s] 71%|███████ | 71/100 [00:07<00:03, 8.22it/s] 73%|███████▎ | 73/100 [00:07<00:02, 9.94it/s] 75%|███████▌ | 75/100 [00:07<00:02, 11.36it/s] 77%|███████▋ | 77/100 [00:07<00:02, 11.42it/s] 79%|███████▉ | 79/100 [00:07<00:01, 12.13it/s] 81%|████████ | 81/100 [00:07<00:01, 12.86it/s] 83%|████████▎ | 83/100 [00:08<00:01, 13.41it/s] 85%|████████▌ | 85/100 [00:08<00:01, 13.85it/s] 87%|████████▋ | 87/100 [00:08<00:00, 13.94it/s] 89%|████████▉ | 89/100 [00:08<00:00, 14.21it/s] 91%|█████████ | 91/100 [00:08<00:00, 13.35it/s] 93%|█████████▎| 93/100 [00:08<00:00, 13.73it/s] 95%|█████████▌| 95/100 [00:08<00:00, 13.51it/s] 97%|█████████▋| 97/100 [00:09<00:00, 13.82it/s] 100%|██████████| 100/100 [00:09<00:00, 10.74it/s] 20%|██ | 2/10 [00:18<01:14, 9.31s/it]
Processing folder: 3
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:06, 14.79it/s] 4%|▍ | 4/100 [00:00<00:07, 13.45it/s] 6%|▌ | 6/100 [00:00<00:07, 13.25it/s] 8%|▊ | 8/100 [00:00<00:06, 13.92it/s] 10%|█ | 10/100 [00:00<00:06, 14.26it/s] 12%|█▏ | 12/100 [00:00<00:06, 14.34it/s] 14%|█▍ | 14/100 [00:00<00:05, 14.54it/s] 16%|█▌ | 16/100 [00:01<00:05, 14.46it/s] 18%|█▊ | 18/100 [00:01<00:05, 14.37it/s] 20%|██ | 20/100 [00:01<00:05, 13.89it/s] 22%|██▏ | 22/100 [00:01<00:05, 14.23it/s] 24%|██▍ | 24/100 [00:01<00:05, 14.56it/s] 26%|██▌ | 26/100 [00:01<00:05, 14.71it/s] 28%|██▊ | 28/100 [00:01<00:04, 14.79it/s] 30%|███ | 30/100 [00:02<00:04, 14.67it/s] 32%|███▏ | 32/100 [00:02<00:04, 14.74it/s] 34%|███▍ | 34/100 [00:02<00:04, 14.49it/s] 36%|███▌ | 36/100 [00:02<00:04, 13.68it/s] 38%|███▊ | 38/100 [00:02<00:04, 14.07it/s] 40%|████ | 40/100 [00:02<00:04, 14.40it/s] 42%|████▏ | 42/100 [00:02<00:04, 14.39it/s] 44%|████▍ | 44/100 [00:03<00:03, 14.47it/s] 46%|████▌ | 46/100 [00:03<00:03, 13.80it/s] 48%|████▊ | 48/100 [00:03<00:03, 14.04it/s] 50%|█████ | 50/100 [00:03<00:03, 13.97it/s] 52%|█████▏ | 52/100 [00:03<00:03, 14.18it/s] 54%|█████▍ | 54/100 [00:03<00:03, 14.30it/s] 56%|█████▌ | 56/100 [00:03<00:03, 14.22it/s] 58%|█████▊ | 58/100 [00:04<00:02, 14.18it/s] 60%|██████ | 60/100 [00:04<00:02, 13.91it/s] 62%|██████▏ | 62/100 [00:04<00:02, 13.78it/s] 64%|██████▍ | 64/100 [00:04<00:02, 13.44it/s] 66%|██████▌ | 66/100 [00:04<00:02, 13.73it/s] 68%|██████▊ | 68/100 [00:04<00:02, 14.01it/s] 70%|███████ | 70/100 [00:04<00:02, 14.19it/s] 72%|███████▏ | 72/100 [00:05<00:02, 13.98it/s] 74%|███████▍ | 74/100 [00:05<00:01, 13.98it/s] 76%|███████▌ | 76/100 [00:05<00:01, 13.94it/s] 78%|███████▊ | 78/100 [00:05<00:01, 13.85it/s] 80%|████████ | 80/100 [00:05<00:01, 13.56it/s] 82%|████████▏ | 82/100 [00:05<00:01, 13.99it/s] 84%|████████▍ | 84/100 [00:05<00:01, 14.06it/s] 86%|████████▌ | 86/100 [00:06<00:00, 14.30it/s] 88%|████████▊ | 88/100 [00:06<00:00, 14.11it/s] 90%|█████████ | 90/100 [00:06<00:00, 14.22it/s] 92%|█████████▏| 92/100 [00:06<00:00, 14.32it/s] 94%|█████████▍| 94/100 [00:06<00:00, 14.15it/s] 96%|█████████▌| 96/100 [00:06<00:00, 14.31it/s] 98%|█████████▊| 98/100 [00:06<00:00, 14.32it/s] 100%|██████████| 100/100 [00:07<00:00, 14.14it/s] 30%|███ | 3/10 [00:25<00:58, 8.29s/it]
Processing folder: 4
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:07, 13.63it/s] 4%|▍ | 4/100 [00:00<00:06, 14.02it/s] 6%|▌ | 6/100 [00:00<00:06, 14.40it/s] 8%|▊ | 8/100 [00:00<00:06, 13.86it/s] 10%|█ | 10/100 [00:00<00:06, 14.06it/s] 12%|█▏ | 12/100 [00:00<00:06, 12.65it/s] 14%|█▍ | 14/100 [00:01<00:07, 11.28it/s] 16%|█▌ | 16/100 [00:01<00:07, 10.90it/s] 18%|█▊ | 18/100 [00:01<00:08, 9.33it/s] 19%|█▉ | 19/100 [00:01<00:09, 8.79it/s] 20%|██ | 20/100 [00:01<00:09, 8.39it/s] 21%|██ | 21/100 [00:02<00:10, 7.85it/s] 22%|██▏ | 22/100 [00:02<00:10, 7.71it/s] 24%|██▍ | 24/100 [00:02<00:08, 8.65it/s] 25%|██▌ | 25/100 [00:02<00:09, 8.30it/s] 26%|██▌ | 26/100 [00:02<00:09, 7.84it/s] 27%|██▋ | 27/100 [00:02<00:09, 8.10it/s] 28%|██▊ | 28/100 [00:02<00:08, 8.11it/s] 29%|██▉ | 29/100 [00:03<00:09, 7.79it/s] 30%|███ | 30/100 [00:03<00:09, 7.33it/s] 31%|███ | 31/100 [00:03<00:09, 7.31it/s] 32%|███▏ | 32/100 [00:03<00:09, 7.16it/s] 33%|███▎ | 33/100 [00:03<00:09, 7.08it/s] 34%|███▍ | 34/100 [00:03<00:09, 7.06it/s] 35%|███▌ | 35/100 [00:03<00:09, 7.08it/s] 36%|███▌ | 36/100 [00:04<00:08, 7.19it/s] 37%|███▋ | 37/100 [00:04<00:08, 7.26it/s] 38%|███▊ | 38/100 [00:04<00:08, 7.25it/s] 39%|███▉ | 39/100 [00:04<00:08, 7.12it/s] 40%|████ | 40/100 [00:04<00:08, 7.06it/s] 41%|████ | 41/100 [00:04<00:08, 7.18it/s] 42%|████▏ | 42/100 [00:04<00:07, 7.74it/s] 43%|████▎ | 43/100 [00:04<00:07, 8.09it/s] 44%|████▍ | 44/100 [00:05<00:07, 7.65it/s] 45%|████▌ | 45/100 [00:05<00:07, 7.43it/s] 46%|████▌ | 46/100 [00:05<00:07, 7.35it/s] 47%|████▋ | 47/100 [00:05<00:07, 7.48it/s] 48%|████▊ | 48/100 [00:05<00:07, 7.31it/s] 49%|████▉ | 49/100 [00:05<00:07, 7.19it/s] 50%|█████ | 50/100 [00:05<00:06, 7.26it/s] 51%|█████ | 51/100 [00:06<00:06, 7.29it/s] 53%|█████▎ | 53/100 [00:06<00:05, 8.46it/s] 55%|█████▌ | 55/100 [00:06<00:04, 10.05it/s] 57%|█████▋ | 57/100 [00:06<00:04, 9.03it/s] 58%|█████▊ | 58/100 [00:06<00:04, 8.61it/s] 60%|██████ | 60/100 [00:06<00:03, 10.04it/s] 62%|██████▏ | 62/100 [00:07<00:03, 11.06it/s] 64%|██████▍ | 64/100 [00:07<00:03, 11.53it/s] 66%|██████▌ | 66/100 [00:07<00:03, 9.18it/s] 68%|██████▊ | 68/100 [00:07<00:03, 10.30it/s] 70%|███████ | 70/100 [00:07<00:02, 11.39it/s] 72%|███████▏ | 72/100 [00:07<00:02, 12.23it/s] 74%|███████▍ | 74/100 [00:08<00:02, 12.26it/s] 76%|███████▌ | 76/100 [00:08<00:02, 9.55it/s] 78%|███████▊ | 78/100 [00:08<00:02, 10.49it/s] 80%|████████ | 80/100 [00:08<00:01, 11.40it/s] 82%|████████▏ | 82/100 [00:08<00:01, 11.74it/s] 84%|████████▍ | 84/100 [00:09<00:01, 12.18it/s] 86%|████████▌ | 86/100 [00:09<00:01, 12.39it/s] 88%|████████▊ | 88/100 [00:09<00:00, 13.05it/s] 90%|█████████ | 90/100 [00:09<00:00, 13.60it/s] 92%|█████████▏| 92/100 [00:09<00:00, 14.06it/s] 94%|█████████▍| 94/100 [00:09<00:00, 14.27it/s] 96%|█████████▌| 96/100 [00:09<00:00, 14.46it/s] 98%|█████████▊| 98/100 [00:09<00:00, 14.66it/s] 100%|██████████| 100/100 [00:10<00:00, 9.86it/s] 40%|████ | 4/10 [00:35<00:54, 9.03s/it]
Processing folder: 5
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:06, 14.55it/s] 4%|▍ | 4/100 [00:00<00:06, 13.94it/s] 6%|▌ | 6/100 [00:00<00:06, 14.39it/s] 8%|▊ | 8/100 [00:00<00:06, 14.63it/s] 10%|█ | 10/100 [00:00<00:06, 14.67it/s] 12%|█▏ | 12/100 [00:00<00:05, 14.72it/s] 14%|█▍ | 14/100 [00:00<00:06, 14.25it/s] 16%|█▌ | 16/100 [00:01<00:06, 13.81it/s] 18%|█▊ | 18/100 [00:01<00:05, 14.18it/s] 20%|██ | 20/100 [00:01<00:05, 14.40it/s] 22%|██▏ | 22/100 [00:01<00:05, 14.59it/s] 24%|██▍ | 24/100 [00:01<00:05, 14.73it/s] 26%|██▌ | 26/100 [00:01<00:04, 14.84it/s] 28%|██▊ | 28/100 [00:01<00:04, 14.82it/s] 30%|███ | 30/100 [00:02<00:05, 13.80it/s] 32%|███▏ | 32/100 [00:02<00:04, 13.91it/s] 34%|███▍ | 34/100 [00:02<00:04, 14.06it/s] 36%|███▌ | 36/100 [00:02<00:04, 14.20it/s] 38%|███▊ | 38/100 [00:02<00:04, 14.26it/s] 40%|████ | 40/100 [00:02<00:04, 14.16it/s] 42%|████▏ | 42/100 [00:02<00:04, 14.23it/s] 44%|████▍ | 44/100 [00:03<00:04, 13.11it/s] 46%|████▌ | 46/100 [00:03<00:04, 13.29it/s] 48%|████▊ | 48/100 [00:03<00:03, 13.60it/s] 50%|█████ | 50/100 [00:03<00:03, 13.96it/s] 52%|█████▏ | 52/100 [00:03<00:03, 14.32it/s] 54%|█████▍ | 54/100 [00:03<00:03, 14.35it/s] 56%|█████▌ | 56/100 [00:03<00:03, 14.49it/s] 58%|█████▊ | 58/100 [00:04<00:03, 13.67it/s] 60%|██████ | 60/100 [00:04<00:02, 13.45it/s] 62%|██████▏ | 62/100 [00:04<00:02, 13.78it/s] 64%|██████▍ | 64/100 [00:04<00:02, 14.02it/s] 66%|██████▌ | 66/100 [00:04<00:02, 13.95it/s] 68%|██████▊ | 68/100 [00:04<00:02, 14.13it/s] 70%|███████ | 70/100 [00:04<00:02, 14.09it/s] 72%|███████▏ | 72/100 [00:05<00:01, 14.02it/s] 74%|███████▍ | 74/100 [00:05<00:01, 13.20it/s] 76%|███████▌ | 76/100 [00:05<00:01, 13.71it/s] 78%|███████▊ | 78/100 [00:05<00:01, 14.10it/s] 80%|████████ | 80/100 [00:05<00:01, 14.36it/s] 82%|████████▏ | 82/100 [00:05<00:01, 14.26it/s] 84%|████████▍ | 84/100 [00:06<00:01, 12.75it/s] 86%|████████▌ | 86/100 [00:06<00:01, 10.72it/s] 88%|████████▊ | 88/100 [00:06<00:01, 9.57it/s] 90%|█████████ | 90/100 [00:06<00:01, 9.59it/s] 92%|█████████▏| 92/100 [00:07<00:00, 8.60it/s] 93%|█████████▎| 93/100 [00:07<00:00, 8.31it/s] 94%|█████████▍| 94/100 [00:07<00:00, 7.96it/s] 95%|█████████▌| 95/100 [00:07<00:00, 7.66it/s] 96%|█████████▌| 96/100 [00:07<00:00, 7.47it/s] 97%|█████████▋| 97/100 [00:07<00:00, 7.73it/s] 98%|█████████▊| 98/100 [00:07<00:00, 8.05it/s] 99%|█████████▉| 99/100 [00:07<00:00, 8.11it/s] 100%|██████████| 100/100 [00:08<00:00, 12.35it/s] 50%|█████ | 5/10 [00:43<00:43, 8.70s/it]
Processing folder: 6
0%| | 0/100 [00:00<?, ?it/s] 1%| | 1/100 [00:00<00:11, 8.50it/s] 2%|▏ | 2/100 [00:00<00:11, 8.24it/s] 4%|▍ | 4/100 [00:00<00:09, 9.71it/s] 5%|▌ | 5/100 [00:00<00:10, 8.73it/s] 6%|▌ | 6/100 [00:00<00:11, 7.95it/s] 7%|▋ | 7/100 [00:00<00:12, 7.46it/s] 8%|▊ | 8/100 [00:01<00:12, 7.20it/s] 10%|█ | 10/100 [00:01<00:10, 8.95it/s] 11%|█ | 11/100 [00:01<00:10, 8.32it/s] 12%|█▏ | 12/100 [00:01<00:10, 8.08it/s] 13%|█▎ | 13/100 [00:01<00:11, 7.85it/s] 14%|█▍ | 14/100 [00:01<00:11, 7.69it/s] 15%|█▌ | 15/100 [00:01<00:10, 8.10it/s] 16%|█▌ | 16/100 [00:01<00:10, 8.06it/s] 18%|█▊ | 18/100 [00:02<00:09, 8.25it/s] 19%|█▉ | 19/100 [00:02<00:09, 8.48it/s] 20%|██ | 20/100 [00:02<00:09, 8.23it/s] 22%|██▏ | 22/100 [00:02<00:08, 9.41it/s] 23%|██▎ | 23/100 [00:02<00:08, 9.28it/s] 24%|██▍ | 24/100 [00:02<00:08, 8.46it/s] 25%|██▌ | 25/100 [00:03<00:09, 8.01it/s]<ipython-input-24-55e539c04948>:5: UserWarning: PySoundFile failed. Trying audioread instead. audio, sample_rate = librosa.load(file, sr=22050) /usr/local/lib/python3.11/dist-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load Deprecated as of librosa version 0.10.0. It will be removed in librosa version 1.0. y, sr_native = __audioread_load(path, offset, duration, dtype) 26%|██▌ | 26/100 [00:03<00:13, 5.51it/s] 28%|██▊ | 28/100 [00:03<00:09, 7.69it/s]
⚠️ Error processing /content/data/06/0_06_50.wav:
30%|███ | 30/100 [00:03<00:07, 9.46it/s] 32%|███▏ | 32/100 [00:03<00:06, 10.82it/s] 34%|███▍ | 34/100 [00:03<00:05, 11.31it/s] 36%|███▌ | 36/100 [00:04<00:05, 12.11it/s] 38%|███▊ | 38/100 [00:04<00:04, 12.87it/s] 40%|████ | 40/100 [00:04<00:04, 13.57it/s] 42%|████▏ | 42/100 [00:04<00:04, 14.06it/s] 44%|████▍ | 44/100 [00:04<00:04, 13.51it/s] 46%|████▌ | 46/100 [00:04<00:03, 13.88it/s] 48%|████▊ | 48/100 [00:04<00:03, 13.27it/s] 50%|█████ | 50/100 [00:05<00:03, 13.53it/s] 52%|█████▏ | 52/100 [00:05<00:03, 13.71it/s] 54%|█████▍ | 54/100 [00:05<00:03, 14.02it/s] 56%|█████▌ | 56/100 [00:05<00:03, 14.09it/s] 58%|█████▊ | 58/100 [00:05<00:03, 13.69it/s] 60%|██████ | 60/100 [00:05<00:02, 13.83it/s] 62%|██████▏ | 62/100 [00:05<00:02, 13.14it/s] 64%|██████▍ | 64/100 [00:06<00:02, 13.34it/s] 66%|██████▌ | 66/100 [00:06<00:02, 13.59it/s] 68%|██████▊ | 68/100 [00:06<00:02, 13.87it/s] 70%|███████ | 70/100 [00:06<00:02, 14.16it/s] 72%|███████▏ | 72/100 [00:06<00:02, 13.66it/s] 74%|███████▍ | 74/100 [00:06<00:01, 13.97it/s] 76%|███████▌ | 76/100 [00:06<00:01, 13.62it/s] 78%|███████▊ | 78/100 [00:07<00:01, 14.04it/s] 80%|████████ | 80/100 [00:07<00:01, 14.46it/s] 82%|████████▏ | 82/100 [00:07<00:01, 14.77it/s] 84%|████████▍ | 84/100 [00:07<00:01, 14.98it/s] 86%|████████▌ | 86/100 [00:07<00:00, 15.09it/s] 88%|████████▊ | 88/100 [00:07<00:00, 14.32it/s] 90%|█████████ | 90/100 [00:07<00:00, 14.11it/s] 92%|█████████▏| 92/100 [00:08<00:00, 14.22it/s] 94%|█████████▍| 94/100 [00:08<00:00, 14.43it/s] 96%|█████████▌| 96/100 [00:08<00:00, 14.64it/s] 98%|█████████▊| 98/100 [00:08<00:00, 14.73it/s] 100%|██████████| 100/100 [00:08<00:00, 11.66it/s] 60%|██████ | 6/10 [00:52<00:34, 8.66s/it]
Processing folder: 7
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:07, 13.20it/s] 4%|▍ | 4/100 [00:00<00:07, 12.42it/s] 6%|▌ | 6/100 [00:00<00:07, 12.88it/s] 8%|▊ | 8/100 [00:00<00:06, 13.46it/s] 10%|█ | 10/100 [00:00<00:06, 13.65it/s] 12%|█▏ | 12/100 [00:00<00:06, 13.87it/s] 14%|█▍ | 14/100 [00:01<00:06, 13.94it/s] 16%|█▌ | 16/100 [00:01<00:06, 13.23it/s] 18%|█▊ | 18/100 [00:01<00:06, 12.48it/s] 20%|██ | 20/100 [00:01<00:06, 12.67it/s] 22%|██▏ | 22/100 [00:01<00:05, 13.34it/s] 24%|██▍ | 24/100 [00:01<00:05, 13.89it/s] 26%|██▌ | 26/100 [00:01<00:05, 14.33it/s] 28%|██▊ | 28/100 [00:02<00:04, 14.55it/s] 30%|███ | 30/100 [00:02<00:05, 13.69it/s] 32%|███▏ | 32/100 [00:02<00:05, 13.18it/s] 34%|███▍ | 34/100 [00:02<00:04, 13.59it/s] 36%|███▌ | 36/100 [00:02<00:04, 13.94it/s] 38%|███▊ | 38/100 [00:02<00:04, 14.34it/s] 40%|████ | 40/100 [00:02<00:04, 14.60it/s] 42%|████▏ | 42/100 [00:03<00:03, 14.56it/s] 44%|████▍ | 44/100 [00:03<00:03, 14.73it/s] 46%|████▌ | 46/100 [00:03<00:04, 13.20it/s] 48%|████▊ | 48/100 [00:03<00:03, 13.67it/s] 50%|█████ | 50/100 [00:03<00:03, 14.02it/s] 52%|█████▏ | 52/100 [00:03<00:03, 14.27it/s] 54%|█████▍ | 54/100 [00:03<00:03, 14.51it/s] 56%|█████▌ | 56/100 [00:04<00:02, 14.67it/s] 58%|█████▊ | 58/100 [00:04<00:02, 14.69it/s] 60%|██████ | 60/100 [00:04<00:02, 13.90it/s] 62%|██████▏ | 62/100 [00:04<00:02, 14.08it/s] 64%|██████▍ | 64/100 [00:04<00:02, 12.18it/s] 66%|██████▌ | 66/100 [00:04<00:03, 10.59it/s] 68%|██████▊ | 68/100 [00:05<00:03, 9.06it/s] 69%|██████▉ | 69/100 [00:05<00:03, 8.54it/s] 70%|███████ | 70/100 [00:05<00:03, 8.12it/s] 71%|███████ | 71/100 [00:05<00:03, 8.04it/s] 72%|███████▏ | 72/100 [00:05<00:03, 7.89it/s] 73%|███████▎ | 73/100 [00:05<00:03, 7.50it/s] 74%|███████▍ | 74/100 [00:06<00:03, 7.42it/s] 75%|███████▌ | 75/100 [00:06<00:03, 7.45it/s] 76%|███████▌ | 76/100 [00:06<00:03, 7.49it/s] 77%|███████▋ | 77/100 [00:06<00:02, 7.89it/s] 78%|███████▊ | 78/100 [00:06<00:02, 8.31it/s] 79%|███████▉ | 79/100 [00:06<00:02, 8.56it/s] 80%|████████ | 80/100 [00:06<00:02, 8.08it/s] 81%|████████ | 81/100 [00:06<00:02, 7.91it/s] 82%|████████▏ | 82/100 [00:07<00:02, 7.59it/s] 83%|████████▎ | 83/100 [00:07<00:02, 7.33it/s] 84%|████████▍ | 84/100 [00:07<00:02, 7.37it/s] 85%|████████▌ | 85/100 [00:07<00:01, 7.93it/s] 86%|████████▌ | 86/100 [00:07<00:01, 8.45it/s] 87%|████████▋ | 87/100 [00:07<00:01, 8.27it/s] 88%|████████▊ | 88/100 [00:07<00:01, 8.15it/s] 90%|█████████ | 90/100 [00:08<00:01, 8.26it/s] 91%|█████████ | 91/100 [00:08<00:01, 7.91it/s] 92%|█████████▏| 92/100 [00:08<00:01, 7.56it/s] 93%|█████████▎| 93/100 [00:08<00:00, 7.33it/s] 95%|█████████▌| 95/100 [00:08<00:00, 8.55it/s] 96%|█████████▌| 96/100 [00:08<00:00, 8.01it/s] 98%|█████████▊| 98/100 [00:09<00:00, 8.16it/s] 99%|█████████▉| 99/100 [00:09<00:00, 7.89it/s] 100%|██████████| 100/100 [00:09<00:00, 10.67it/s] 70%|███████ | 7/10 [01:01<00:26, 8.90s/it]
Processing folder: 8
0%| | 0/100 [00:00<?, ?it/s] 1%| | 1/100 [00:00<00:12, 7.85it/s] 2%|▏ | 2/100 [00:00<00:13, 7.32it/s] 3%|▎ | 3/100 [00:00<00:13, 6.99it/s] 5%|▌ | 5/100 [00:00<00:09, 9.75it/s] 7%|▋ | 7/100 [00:00<00:08, 11.28it/s] 9%|▉ | 9/100 [00:00<00:07, 12.27it/s] 11%|█ | 11/100 [00:01<00:07, 12.46it/s] 13%|█▎ | 13/100 [00:01<00:06, 13.01it/s] 15%|█▌ | 15/100 [00:01<00:06, 12.81it/s] 17%|█▋ | 17/100 [00:01<00:06, 13.38it/s] 19%|█▉ | 19/100 [00:01<00:05, 13.78it/s] 21%|██ | 21/100 [00:01<00:05, 13.93it/s] 23%|██▎ | 23/100 [00:01<00:05, 13.99it/s] 25%|██▌ | 25/100 [00:02<00:05, 13.71it/s] 27%|██▋ | 27/100 [00:02<00:05, 13.70it/s] 29%|██▉ | 29/100 [00:02<00:05, 13.58it/s] 31%|███ | 31/100 [00:02<00:05, 13.58it/s] 33%|███▎ | 33/100 [00:02<00:04, 13.57it/s] 35%|███▌ | 35/100 [00:02<00:04, 13.73it/s] 37%|███▋ | 37/100 [00:02<00:04, 12.96it/s] 39%|███▉ | 39/100 [00:03<00:04, 12.94it/s] 41%|████ | 41/100 [00:03<00:04, 13.13it/s] 43%|████▎ | 43/100 [00:03<00:04, 13.37it/s] 45%|████▌ | 45/100 [00:03<00:04, 13.31it/s] 47%|████▋ | 47/100 [00:03<00:03, 13.91it/s] 49%|████▉ | 49/100 [00:03<00:03, 14.27it/s] 51%|█████ | 51/100 [00:03<00:03, 13.75it/s] 53%|█████▎ | 53/100 [00:04<00:03, 13.96it/s] 55%|█████▌ | 55/100 [00:04<00:03, 14.08it/s] 57%|█████▋ | 57/100 [00:04<00:03, 14.23it/s] 59%|█████▉ | 59/100 [00:04<00:03, 13.50it/s] 61%|██████ | 61/100 [00:04<00:02, 13.73it/s] 63%|██████▎ | 63/100 [00:04<00:02, 13.77it/s] 65%|██████▌ | 65/100 [00:04<00:02, 13.24it/s] 67%|██████▋ | 67/100 [00:05<00:02, 13.74it/s] 69%|██████▉ | 69/100 [00:05<00:02, 13.90it/s] 71%|███████ | 71/100 [00:05<00:02, 14.07it/s] 73%|███████▎ | 73/100 [00:05<00:01, 13.95it/s] 75%|███████▌ | 75/100 [00:05<00:01, 13.95it/s] 77%|███████▋ | 77/100 [00:05<00:01, 14.07it/s] 79%|███████▉ | 79/100 [00:05<00:01, 14.00it/s] 81%|████████ | 81/100 [00:06<00:01, 14.27it/s] 83%|████████▎ | 83/100 [00:06<00:01, 14.38it/s] 85%|████████▌ | 85/100 [00:06<00:01, 14.68it/s] 87%|████████▋ | 87/100 [00:06<00:00, 14.89it/s] 89%|████████▉ | 89/100 [00:06<00:00, 14.04it/s] 91%|█████████ | 91/100 [00:06<00:00, 14.28it/s] 93%|█████████▎| 93/100 [00:06<00:00, 13.92it/s] 95%|█████████▌| 95/100 [00:07<00:00, 13.73it/s] 97%|█████████▋| 97/100 [00:07<00:00, 13.79it/s] 100%|██████████| 100/100 [00:07<00:00, 13.48it/s] 80%|████████ | 8/10 [01:09<00:16, 8.43s/it]
Processing folder: 9
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:06, 15.15it/s] 4%|▍ | 4/100 [00:00<00:06, 14.32it/s] 6%|▌ | 6/100 [00:00<00:06, 14.22it/s] 8%|▊ | 8/100 [00:00<00:06, 13.83it/s] 10%|█ | 10/100 [00:00<00:06, 13.79it/s] 12%|█▏ | 12/100 [00:00<00:06, 14.04it/s] 14%|█▍ | 14/100 [00:00<00:06, 13.95it/s] 16%|█▌ | 16/100 [00:01<00:05, 14.14it/s] 18%|█▊ | 18/100 [00:01<00:06, 13.38it/s] 20%|██ | 20/100 [00:01<00:05, 13.83it/s] 22%|██▏ | 22/100 [00:01<00:05, 13.48it/s] 24%|██▍ | 24/100 [00:01<00:05, 13.79it/s] 26%|██▌ | 26/100 [00:01<00:05, 14.08it/s] 28%|██▊ | 28/100 [00:01<00:05, 14.35it/s] 30%|███ | 30/100 [00:02<00:04, 14.53it/s] 32%|███▏ | 32/100 [00:02<00:05, 13.59it/s] 34%|███▍ | 34/100 [00:02<00:04, 13.98it/s] 36%|███▌ | 36/100 [00:02<00:04, 13.38it/s] 38%|███▊ | 38/100 [00:02<00:04, 13.45it/s] 40%|████ | 40/100 [00:02<00:04, 13.90it/s] 42%|████▏ | 42/100 [00:03<00:04, 13.84it/s] 44%|████▍ | 44/100 [00:03<00:04, 12.12it/s] 46%|████▌ | 46/100 [00:03<00:05, 10.00it/s] 48%|████▊ | 48/100 [00:03<00:05, 9.25it/s] 50%|█████ | 50/100 [00:04<00:05, 8.46it/s] 51%|█████ | 51/100 [00:04<00:05, 8.23it/s] 53%|█████▎ | 53/100 [00:04<00:05, 8.56it/s] 54%|█████▍ | 54/100 [00:04<00:05, 8.31it/s] 55%|█████▌ | 55/100 [00:04<00:05, 7.95it/s] 56%|█████▌ | 56/100 [00:04<00:05, 7.81it/s] 57%|█████▋ | 57/100 [00:04<00:05, 7.90it/s] 59%|█████▉ | 59/100 [00:05<00:04, 8.92it/s] 60%|██████ | 60/100 [00:05<00:04, 8.77it/s] 61%|██████ | 61/100 [00:05<00:04, 8.37it/s] 62%|██████▏ | 62/100 [00:05<00:04, 8.06it/s] 63%|██████▎ | 63/100 [00:05<00:04, 8.15it/s] 64%|██████▍ | 64/100 [00:05<00:04, 7.67it/s] 65%|██████▌ | 65/100 [00:05<00:04, 7.46it/s] 66%|██████▌ | 66/100 [00:06<00:04, 7.35it/s] 67%|██████▋ | 67/100 [00:06<00:04, 7.43it/s] 68%|██████▊ | 68/100 [00:06<00:04, 7.85it/s] 69%|██████▉ | 69/100 [00:06<00:04, 7.40it/s] 70%|███████ | 70/100 [00:06<00:04, 7.25it/s] 71%|███████ | 71/100 [00:06<00:03, 7.25it/s] 73%|███████▎ | 73/100 [00:06<00:03, 7.86it/s] 74%|███████▍ | 74/100 [00:07<00:03, 7.93it/s] 75%|███████▌ | 75/100 [00:07<00:03, 7.81it/s] 76%|███████▌ | 76/100 [00:07<00:03, 7.75it/s] 77%|███████▋ | 77/100 [00:07<00:02, 8.23it/s] 78%|███████▊ | 78/100 [00:07<00:02, 7.87it/s] 79%|███████▉ | 79/100 [00:07<00:02, 7.44it/s] 80%|████████ | 80/100 [00:07<00:02, 7.99it/s] 81%|████████ | 81/100 [00:08<00:02, 7.79it/s] 82%|████████▏ | 82/100 [00:08<00:02, 7.50it/s] 83%|████████▎ | 83/100 [00:08<00:02, 7.32it/s] 85%|████████▌ | 85/100 [00:08<00:01, 9.49it/s] 86%|████████▌ | 86/100 [00:08<00:01, 9.40it/s] 88%|████████▊ | 88/100 [00:08<00:01, 11.04it/s] 90%|█████████ | 90/100 [00:08<00:00, 11.24it/s] 92%|█████████▏| 92/100 [00:08<00:00, 12.12it/s] 94%|█████████▍| 94/100 [00:09<00:00, 12.70it/s] 96%|█████████▌| 96/100 [00:09<00:00, 13.18it/s] 98%|█████████▊| 98/100 [00:09<00:00, 13.37it/s] 100%|██████████| 100/100 [00:09<00:00, 10.42it/s] 90%|█████████ | 9/10 [01:18<00:08, 8.80s/it]
Processing folder: 10
0%| | 0/100 [00:00<?, ?it/s] 2%|▏ | 2/100 [00:00<00:06, 14.76it/s] 4%|▍ | 4/100 [00:00<00:07, 12.71it/s] 6%|▌ | 6/100 [00:00<00:07, 13.35it/s] 8%|▊ | 8/100 [00:00<00:06, 13.71it/s] 10%|█ | 10/100 [00:00<00:06, 13.84it/s] 12%|█▏ | 12/100 [00:00<00:06, 13.43it/s] 14%|█▍ | 14/100 [00:01<00:06, 13.50it/s] 16%|█▌ | 16/100 [00:01<00:06, 13.47it/s] 18%|█▊ | 18/100 [00:01<00:06, 12.72it/s] 20%|██ | 20/100 [00:01<00:06, 13.09it/s] 22%|██▏ | 22/100 [00:01<00:05, 13.50it/s] 24%|██▍ | 24/100 [00:01<00:05, 13.78it/s] 26%|██▌ | 26/100 [00:01<00:05, 13.60it/s] 28%|██▊ | 28/100 [00:02<00:05, 13.70it/s] 30%|███ | 30/100 [00:02<00:05, 13.81it/s] 32%|███▏ | 32/100 [00:02<00:05, 13.59it/s] 34%|███▍ | 34/100 [00:02<00:04, 13.96it/s] 36%|███▌ | 36/100 [00:02<00:04, 14.28it/s] 38%|███▊ | 38/100 [00:02<00:04, 14.31it/s] 40%|████ | 40/100 [00:02<00:04, 13.87it/s] 42%|████▏ | 42/100 [00:03<00:04, 13.81it/s] 44%|████▍ | 44/100 [00:03<00:04, 13.97it/s] 46%|████▌ | 46/100 [00:03<00:04, 13.35it/s] 48%|████▊ | 48/100 [00:03<00:03, 13.68it/s] 50%|█████ | 50/100 [00:03<00:03, 13.83it/s] 52%|█████▏ | 52/100 [00:03<00:03, 13.97it/s] 54%|█████▍ | 54/100 [00:03<00:03, 13.85it/s] 56%|█████▌ | 56/100 [00:04<00:03, 13.85it/s] 58%|█████▊ | 58/100 [00:04<00:03, 13.67it/s] 60%|██████ | 60/100 [00:04<00:02, 13.39it/s] 62%|██████▏ | 62/100 [00:04<00:02, 13.44it/s] 64%|██████▍ | 64/100 [00:04<00:02, 13.54it/s] 66%|██████▌ | 66/100 [00:04<00:02, 13.57it/s] 68%|██████▊ | 68/100 [00:04<00:02, 13.46it/s] 70%|███████ | 70/100 [00:05<00:02, 13.86it/s] 72%|███████▏ | 72/100 [00:05<00:01, 14.12it/s] 74%|███████▍ | 74/100 [00:05<00:01, 14.21it/s] 76%|███████▌ | 76/100 [00:05<00:01, 13.46it/s] 78%|███████▊ | 78/100 [00:05<00:01, 13.84it/s] 80%|████████ | 80/100 [00:05<00:01, 13.99it/s] 82%|████████▏ | 82/100 [00:05<00:01, 13.99it/s] 84%|████████▍ | 84/100 [00:06<00:01, 13.96it/s] 86%|████████▌ | 86/100 [00:06<00:01, 13.88it/s] 88%|████████▊ | 88/100 [00:06<00:00, 13.95it/s] 90%|█████████ | 90/100 [00:06<00:00, 13.54it/s] 92%|█████████▏| 92/100 [00:06<00:00, 14.01it/s] 94%|█████████▍| 94/100 [00:06<00:00, 14.26it/s] 96%|█████████▌| 96/100 [00:06<00:00, 14.37it/s] 98%|█████████▊| 98/100 [00:07<00:00, 14.20it/s] 100%|██████████| 100/100 [00:07<00:00, 13.76it/s] 100%|██████████| 10/10 [01:26<00:00, 8.63s/it]
features class 0 [-288.7327, 105.90115, 18.776207, 23.682646, 5... 1 1 [-107.203255, 88.49289, -4.1719, 55.477848, -8... 1 2 [-159.5804, 69.806015, -4.402107, 76.845116, 3... 1 3 [-95.44059, 105.23433, -26.953482, 60.816486, ... 1 4 [-350.35263, 169.53174, 31.771353, 16.71844, 2... 1 class 1 100 2 100 3 100 4 100 5 100 7 100 8 100 9 100 10 100 6 99 Name: count, dtype: int64
Create the dataset using the defined function
View first 5 rows of the data
# View the head of the DataFrame
dataset.head()
| features | class | |
|---|---|---|
| 0 | [-288.7327, 105.90115, 18.776207, 23.682646, 5... | 1 |
| 1 | [-107.203255, 88.49289, -4.1719, 55.477848, -8... | 1 |
| 2 | [-159.5804, 69.806015, -4.402107, 76.845116, 3... | 1 |
| 3 | [-95.44059, 105.23433, -26.953482, 60.816486, ... | 1 |
| 4 | [-350.35263, 169.53174, 31.771353, 16.71844, 2... | 1 |
dataset.shape
(999, 2)
dataset.dtypes
| 0 | |
|---|---|
| features | object |
| class | int64 |
# Storing the class as int
dataset['class'] = [int(x) for x in dataset['class']] # convert from object to integer
# Check the frequency of classes in the dataset
dataset['class'].value_counts()
| count | |
|---|---|
| class | |
| 1 | 100 |
| 2 | 100 |
| 3 | 100 |
| 4 | 100 |
| 5 | 100 |
| 7 | 100 |
| 8 | 100 |
| 9 | 100 |
| 10 | 100 |
| 6 | 99 |
Visualizing the Mel Frequency Cepstral Coefficients Using a Spectrogram¶
- draw_spectrograms : From the Mel Coefficients we are extracting for a particular audio, this function is creating the 2-D graph of those coefficients with the X-axis representing time and the Y-axis shows the corresponding Mel coefficients in that time step.
# Function to extract and visualize MFCCs
def draw_spectrograms(audio_data, sample_rate):
# Extract MFCC features
extracted_features = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)
# Return features without scaling
return extracted_features
Plot the MFCCs, but it's difficult to tell what kind of signal is hiding behind such representation.
# Creating subplots
fig, ax = plt.subplots(5, 2, figsize = (15, 30))
# Initializing row and column variables for subplots
row = 0
column = 0
for digit in range(10):
# Get the audio of different classes (0-9)
audio_data, sample_rate = get_audio_raw(digit)
# Extract their MFCC
mfcc = draw_spectrograms(audio_data, sample_rate)
print(f"Shape of MFCC of audio digit {digit} ---> ", mfcc.shape)
# Display the plots and its title
ax[row,column].set_title(f"MFCC of audio class {digit} across time")
librosa.display.specshow(mfcc, sr = 22050, ax = ax[row, column])
# specshow
# Set X-labels and Y-labels
ax[row,column].set_xlabel("Time")
ax[row,column].set_ylabel("MFCC Coefficients")
# Conditions for positioning of the plots
if column == 1:
column = 0
row += 1
else:
column+=1
plt.tight_layout(pad = 3)
plt.show()
Shape of MFCC of audio digit 0 ---> (40, 1298) Shape of MFCC of audio digit 1 ---> (40, 1293) Shape of MFCC of audio digit 2 ---> (40, 1293) Shape of MFCC of audio digit 3 ---> (40, 1320) Shape of MFCC of audio digit 4 ---> (40, 1301) Shape of MFCC of audio digit 5 ---> (40, 1293) Shape of MFCC of audio digit 6 ---> (40, 1293) Shape of MFCC of audio digit 7 ---> (40, 1293) Shape of MFCC of audio digit 8 ---> (40, 1293) Shape of MFCC of audio digit 9 ---> (40, 1293)
Each image represents MFCC spectrograms for different audio classes (digits 0-9).
MFCCs (Mel Frequency Cepstral Coefficients) capture spectral properties of sound, useful for distinguishing different audio patterns.
X-axis represents time (progression of audio signal).
Y-axis represents MFCC coefficients (features extracted from the sound frequencies).
The MFCC spectrograms you generated show how the spectral energy (frequency content) of audio changes over time. Here's how to interpret the colors:
- Red & Orange Areas → Higher energy (louder sounds)
- Blue Areas → Lower energy (quieter sounds)
- Horizontal Bands → Harmonic content (musical tones)
- Vertical Variations → Changes in rhythm, beat, and sound texture
Folders "01" to "10" represent different genres (each class is a genres)
- MFCC spectrograms you generated represent the spectral features of each genre.
Each genre has unique spectral characteristics:
- Classical/Jazz: More smooth and continuous patterns.
- Rock/Metal: More irregular and dynamic shifts in intensity.
- Electronic/EDM: Clear periodic patterns from synthesized beats.
- Hip-Hop/Rap: Strong low-frequency presence (bass-heavy).
Improve data visualization of the image above:
- Change the subplot size.
- Use cmap input option in librosa.display.specshow
# Creating subplots
fig, ax = plt.subplots(2, 5, figsize = (15, 7))
# Initializing row and column variables for subplots
row = 0
column = 0
for digit in range(10):
# Get the audio of different classes (0-9)
audio_data, sample_rate = get_audio_raw(digit)
# Extract their MFCC
mfcc = draw_spectrograms(audio_data, sample_rate)
print(f"Shape of MFCC of audio digit {digit} ---> ", mfcc.shape)
# Display the plots and its title
ax[row,column].set_title(f"Class {digit}")
librosa.display.specshow(mfcc, sr = 22050, ax = ax[row, column], cmap='tab20') # cmap='tab20') difference from other cell
# Set X-labels and Y-labels
ax[row,column].set_xlabel("Time")
ax[row,column].set_ylabel("MFCC Coefficients")
# Conditions for positioning of the plots
if row == 1:
row = 0
column += 1
else:
row+=1
fig.suptitle('MFCC of different audio class')
plt.tight_layout(pad=1)
plt.show()
Shape of MFCC of audio digit 0 ---> (40, 1293) Shape of MFCC of audio digit 1 ---> (40, 1293) Shape of MFCC of audio digit 2 ---> (40, 1293) Shape of MFCC of audio digit 3 ---> (40, 1293) Shape of MFCC of audio digit 4 ---> (40, 1293) Shape of MFCC of audio digit 5 ---> (40, 1303) Shape of MFCC of audio digit 6 ---> (40, 1293) Shape of MFCC of audio digit 7 ---> (40, 1293) Shape of MFCC of audio digit 8 ---> (40, 1293) Shape of MFCC of audio digit 9 ---> (40, 1293)
Observations on the MFCC Spectrograms Across Music Genres
- Each plot represents the Mel Frequency Cepstral Coefficients (MFCCs) for different music genres. MFCCs capture the frequency characteristics of the audio signal, which helps in identifying unique patterns across genres.
Color Interpretation
- The color variations in the spectrograms represent different frequency intensities over time.
- The "tab20" colormap (used in the second plot) assigns distinct colors rather than a gradient-based heatmap, which categorizes different frequency components instead of showing intensity levels.
- Some spectrograms appear denser and more uniform, while others are more sparse and structured, indicating differences in harmonic and rhythmic complexity.
Genre-Specific Observations
- 01 blues
- Dark brown, dense spectrogram with scattered frequency variations.
- Blues music typically has strong mid-range frequencies with steady rhythms.
- Expect repeating patterns due to the classic 12-bar blues structure.
- 02 classical
- Lighter, structured frequency content.
- Classical music features rich harmonics and smooth variations.
- Sparse high-frequency content due to dominant string and orchestral instruments.
- 03 - Country
- Grayish pattern with noticeable separations.
- Country music typically has clear vocals and acoustic instruments.
- Expect steady mid-range energy with occasional high-frequency bursts (e.g., from string plucks).
- 04 - Disco
- Dense, uniform color patterns indicating strong rhythmic beats.
- Disco is bass-heavy with consistent mid and high-range frequencies.
- Expect periodic peaks due to the dance beat structure.
- 05 - Hip-hop
- Sparse frequency distribution with dominant low and mid-range energy
- Hip-hop often has bass-heavy beats, with sharp peaks for percussive elements (kick & snare drums).
- Less harmonic complexity compared to classical or jazz.
- 06 - Jazz
- Balanced distribution with noticeable high-frequency components.
- Jazz has complex harmonic structures with frequent chord changes.
- Expect instrumental variation (saxophones, trumpets, pianos) contributing to rich harmonics.
- 07 - Metal
- Highly intense, dense spectrogram.
- Metal is distorted guitar-heavy with aggressive high-frequency components.
- Expect strong mid-high frequency dominance due to power chords & cymbals.
- 08 - Pop
- High-energy spectrogram with widespread frequency coverage.
- Pop songs typically have clear vocals and electronic beats.
- Expect consistency in spectral features due to polished production.
- 09 - Reggae
- More sparse compared to pop & metal, with distinct rhythm-based separations.
- Reggae has a relaxed beat structure, often with emphasis on offbeat rhythms.
- Mid-range dominance with occasional sharp high-frequency peaks.
- 10 - Rock
- Fairly dense spectrogram, not as intense as metal but still featuring strong mid-range energy.
- Rock music has consistent drum beats, electric guitars, and vocals.
- Expect noticeable frequency variation based on instrumentation.
- 01 blues
Summary
- Classical and Jazz have smoother, structured harmonic-rich spectrograms.
- Blues, Rock, and Country share some mid-range similarities but differ in rhythmic structure.
- Metal and Hip-hop show intense frequency dominance in different areas.
- Pop and Disco exhibit structured, periodic patterns due to strong rhythmic consistency.
- Reggae has more gap-separated rhythmic structures, making it unique.
Perform Train-Test-Split¶
Split the data into train and test sets
# # Import train_test_split function
# from sklearn.model_selection import train_test_split
# X = np.array(dataset['features'].to_list())
# Y = np.array(dataset['class'].to_list()) # Target
# # Create train set and test set
# X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size = 0.75, shuffle = True, random_state = 8)
# Import train_test_split function
from sklearn.model_selection import train_test_split
X = np.array(dataset['features'].to_list())
Y = np.array(dataset['class'].to_list()) - 1 # Fix: Convert from 1-10 to 0-9
# Create train set and test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.75, shuffle=True, random_state=8)
# Checking the shape of the data
X_train.shape
(749, 40)
X_train
array([[-5.7759789e+01, 9.5766479e+01, -1.0109319e+01, ...,
-3.7151713e+00, -4.5247555e+00, -3.6879799e+00],
[-9.4268730e+01, 9.1890869e+01, -1.7601385e+01, ...,
2.3584796e-01, -1.9522644e+00, -5.6698909e+00],
[-3.0045297e+02, 9.9252052e+01, 5.3022732e+01, ...,
-1.8707280e+00, -8.2629883e-01, -1.4659938e+00],
...,
[-2.2630965e+02, 7.8283951e+01, 7.8239198e+00, ...,
-1.5126579e+00, -3.6971474e-01, -3.4288461e+00],
[-1.0856484e+02, 6.9971283e+01, 1.4885138e+01, ...,
-2.9707897e+00, -1.7937195e+00, -2.3009326e+00],
[-1.0951281e+02, 9.7391228e+01, -2.0617918e+01, ...,
-1.3636816e+00, 1.3630954e+00, -1.4500473e+00]], dtype=float32)
Y_train
array([2, 9, 8, 7, 5, 6, 4, 9, 6, 8, 0, 3, 3, 7, 0, 4, 1, 0, 2, 7, 5, 9,
6, 6, 8, 5, 5, 2, 7, 5, 7, 3, 8, 3, 0, 8, 3, 1, 0, 7, 9, 2, 4, 1,
3, 4, 8, 2, 9, 4, 1, 9, 6, 9, 2, 7, 4, 8, 8, 9, 8, 1, 2, 3, 9, 1,
5, 4, 5, 8, 5, 5, 5, 7, 4, 3, 9, 7, 9, 0, 4, 2, 6, 3, 2, 9, 6, 6,
9, 0, 1, 5, 7, 4, 6, 5, 5, 8, 7, 2, 0, 0, 3, 0, 8, 9, 4, 1, 7, 0,
3, 1, 6, 9, 6, 5, 8, 5, 0, 4, 4, 7, 9, 5, 8, 3, 0, 4, 1, 8, 1, 2,
6, 1, 6, 3, 1, 1, 1, 5, 2, 4, 4, 3, 5, 0, 6, 7, 6, 5, 2, 3, 6, 6,
0, 2, 2, 0, 0, 9, 5, 8, 4, 9, 9, 8, 7, 5, 3, 3, 0, 7, 1, 5, 5, 2,
3, 2, 6, 2, 2, 5, 3, 8, 7, 3, 1, 3, 0, 0, 2, 8, 7, 9, 7, 5, 2, 0,
4, 8, 9, 1, 0, 6, 7, 4, 7, 2, 1, 2, 7, 2, 8, 7, 0, 6, 9, 5, 1, 5,
2, 6, 3, 9, 0, 3, 9, 6, 7, 7, 5, 1, 0, 9, 4, 6, 9, 7, 3, 1, 1, 8,
7, 2, 6, 5, 3, 6, 6, 6, 6, 8, 3, 3, 6, 7, 1, 6, 6, 8, 5, 6, 0, 6,
9, 7, 1, 9, 3, 2, 3, 5, 7, 5, 8, 8, 5, 7, 1, 1, 4, 8, 8, 8, 7, 7,
0, 9, 2, 7, 0, 5, 2, 9, 9, 7, 2, 9, 4, 0, 8, 3, 9, 8, 0, 0, 0, 5,
9, 0, 9, 9, 3, 9, 1, 6, 9, 7, 5, 0, 0, 8, 3, 4, 1, 4, 9, 9, 9, 4,
2, 9, 2, 4, 7, 0, 2, 3, 0, 0, 0, 2, 2, 1, 3, 5, 2, 0, 8, 7, 4, 5,
7, 0, 6, 5, 2, 7, 5, 8, 5, 2, 9, 8, 5, 4, 3, 0, 7, 3, 9, 2, 7, 2,
7, 9, 9, 8, 0, 5, 5, 5, 1, 9, 7, 1, 4, 9, 2, 4, 8, 3, 2, 5, 3, 8,
9, 8, 6, 1, 7, 1, 0, 6, 2, 1, 2, 8, 9, 5, 4, 4, 2, 1, 7, 9, 0, 5,
7, 4, 1, 9, 8, 6, 5, 5, 4, 9, 7, 3, 3, 8, 9, 8, 6, 5, 1, 6, 8, 6,
8, 7, 6, 1, 3, 1, 5, 7, 0, 6, 4, 0, 8, 3, 2, 0, 3, 7, 0, 1, 5, 6,
6, 1, 2, 7, 2, 7, 2, 4, 7, 4, 0, 2, 0, 0, 3, 0, 1, 2, 3, 7, 9, 6,
8, 6, 0, 4, 1, 2, 9, 3, 3, 0, 6, 2, 9, 8, 2, 1, 3, 0, 3, 0, 6, 6,
9, 7, 3, 2, 6, 4, 1, 0, 6, 6, 2, 4, 2, 8, 5, 8, 1, 3, 4, 5, 5, 5,
4, 4, 6, 2, 4, 7, 4, 8, 9, 2, 4, 3, 1, 4, 4, 8, 8, 2, 3, 6, 3, 8,
8, 1, 9, 1, 7, 1, 2, 4, 4, 8, 7, 3, 2, 4, 4, 3, 3, 2, 4, 8, 6, 6,
0, 0, 9, 9, 9, 1, 9, 5, 6, 5, 2, 0, 2, 4, 1, 8, 2, 9, 1, 7, 5, 1,
7, 7, 6, 4, 5, 3, 9, 7, 1, 8, 4, 7, 9, 5, 7, 4, 4, 1, 5, 1, 1, 6,
9, 5, 5, 3, 3, 9, 8, 5, 1, 6, 4, 4, 2, 1, 2, 0, 4, 6, 8, 6, 4, 7,
0, 6, 2, 2, 0, 2, 7, 1, 6, 7, 7, 1, 2, 1, 5, 3, 9, 0, 4, 7, 3, 3,
0, 8, 1, 3, 6, 9, 0, 6, 4, 3, 5, 4, 4, 3, 2, 6, 6, 2, 3, 1, 5, 1,
4, 0, 9, 8, 3, 5, 7, 0, 7, 5, 6, 0, 0, 5, 5, 4, 3, 4, 7, 8, 0, 4,
3, 7, 0, 8, 3, 4, 8, 1, 8, 8, 8, 9, 9, 5, 7, 4, 9, 1, 2, 5, 3, 3,
3, 1, 0, 9, 2, 6, 6, 4, 1, 9, 8, 0, 8, 0, 7, 3, 8, 1, 9, 1, 3, 3,
4])
Artificial Neural Networks (ANNs)¶
Modelling¶
Create an artificial neural network to recognize the digit.
About the libraries:
- Keras: Keras is an open-source deep-learning library in Python. Keras is popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.
- Sklearn :
- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Open source, commercially usable
Import necessary libraries for building the model
# To create an ANN model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
# To create a checkpoint and save the best model
# from tensorflow.keras.callbacks import ModelCheckpoint
# To load the model
from tensorflow.keras.models import load_model
# Input
from tensorflow.keras import Input
# To evaluate the model
from sklearn.metrics import classification_report, confusion_matrix
# from sklearn.preprocessing import LabelBinarizer
Model Creation¶
When we are converting audios to their corresponding spectrograms, we will have similar spectrograms for similar audios irrespective of who the speaker is, and what is their pitch and timber like. So local spatiality is never going to be a problem. So having convolutional layers on top of our fully connected layers is just adding to our computational redundancy.
We will use a Sequential model with multiple connected hidden layers, and an output layer that returns a single, continuous value.
- A Sequential model is a linear stack of layers. Sequential models can be created by giving a list of layer instances.
- A dense layer of neurons is a simple layer of neurons in which each neuron receives input from all of the neurons in the previous layer.
- The most popular function employed for hidden layers is the rectified linear activation function, or ReLU activation function. It's popular because it's easy to use and effective in getting around the limitations of other popular activation functions like Sigmoid and Tanh.
he input shape specifies the shape of the input data. It is important to ensure that your model can process the data correctly.
# Crete a Sequential Object
model1 = Sequential()
# Set input shape
model1.add(Input(shape=(40, )))
#---------------------------------------------------------------------------------Hidden Layer --------------------------------------
# Add first layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))
# use equal to or greater than 40
# Add second layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))
# Add third layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))
# ----------------------------------------------------------------------------Output Layer ------------------------------------
# Output layer with 10 neurons as it has 10 classes
model1.add(Dense(10, activation = 'softmax')) # multi-classification
# Print Summary of the model
model1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 100) │ 4,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 100) │ 10,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 100) │ 10,100 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 10) │ 1,010 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 25,310 (98.87 KB)
Trainable params: 25,310 (98.87 KB)
Non-trainable params: 0 (0.00 B)
- CategoricalCrossentropy:
- The labels must be provided in a one_hot representation.
- if you one-hot encode your labels before feeding them into the model.
- SparseCategoricalCrossentropy:
- The labels must be provided as integers.
- labels are integers
# Compile the model
model1.compile(loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'],
optimizer = 'adam')
Model Checkpoint & Training¶
%%time
# Set the number of epochs for training
num_epochs = 100
# Set the batch size for training
batch_size = 32
# Fit the model
model1.fit(X_train,
Y_train,
validation_data = (X_test, Y_test),
epochs = num_epochs,
batch_size = batch_size,
verbose = 1)
Epoch 1/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 2s 16ms/step - accuracy: 0.1721 - loss: 7.4053 - val_accuracy: 0.2960 - val_loss: 2.3302 Epoch 2/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.3757 - loss: 1.9830 - val_accuracy: 0.3920 - val_loss: 1.8912 Epoch 3/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.4309 - loss: 1.6333 - val_accuracy: 0.3840 - val_loss: 1.7328 Epoch 4/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.4708 - loss: 1.4402 - val_accuracy: 0.5000 - val_loss: 1.6047 Epoch 5/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.5663 - loss: 1.2619 - val_accuracy: 0.4960 - val_loss: 1.4947 Epoch 6/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6301 - loss: 1.1299 - val_accuracy: 0.4800 - val_loss: 1.5686 Epoch 7/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6096 - loss: 1.1090 - val_accuracy: 0.4120 - val_loss: 1.6659 Epoch 8/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.5946 - loss: 1.1142 - val_accuracy: 0.4760 - val_loss: 1.6022 Epoch 9/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6413 - loss: 0.9965 - val_accuracy: 0.5200 - val_loss: 1.4391 Epoch 10/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7032 - loss: 0.8600 - val_accuracy: 0.4560 - val_loss: 1.6319 Epoch 11/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6890 - loss: 0.8508 - val_accuracy: 0.5000 - val_loss: 1.5144 Epoch 12/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7197 - loss: 0.7505 - val_accuracy: 0.5120 - val_loss: 1.4340 Epoch 13/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7658 - loss: 0.7366 - val_accuracy: 0.5360 - val_loss: 1.4602 Epoch 14/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.7646 - loss: 0.6575 - val_accuracy: 0.5160 - val_loss: 1.5311 Epoch 15/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7480 - loss: 0.7283 - val_accuracy: 0.5320 - val_loss: 1.5356 Epoch 16/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8244 - loss: 0.5501 - val_accuracy: 0.5360 - val_loss: 1.5338 Epoch 17/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8065 - loss: 0.5486 - val_accuracy: 0.5320 - val_loss: 1.5437 Epoch 18/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8285 - loss: 0.5388 - val_accuracy: 0.5440 - val_loss: 1.5237 Epoch 19/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.8305 - loss: 0.5253 - val_accuracy: 0.5160 - val_loss: 1.6184 Epoch 20/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7829 - loss: 0.5784 - val_accuracy: 0.5280 - val_loss: 1.6348 Epoch 21/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.8299 - loss: 0.5161 - val_accuracy: 0.5040 - val_loss: 1.7883 Epoch 22/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.7969 - loss: 0.5071 - val_accuracy: 0.5240 - val_loss: 1.6919 Epoch 23/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8248 - loss: 0.4730 - val_accuracy: 0.5360 - val_loss: 1.6463 Epoch 24/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8842 - loss: 0.3677 - val_accuracy: 0.5320 - val_loss: 1.6900 Epoch 25/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9007 - loss: 0.3588 - val_accuracy: 0.5640 - val_loss: 1.6553 Epoch 26/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8961 - loss: 0.3422 - val_accuracy: 0.5560 - val_loss: 1.6764 Epoch 27/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8942 - loss: 0.3223 - val_accuracy: 0.5280 - val_loss: 1.7465 Epoch 28/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8893 - loss: 0.3733 - val_accuracy: 0.5320 - val_loss: 1.7637 Epoch 29/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9029 - loss: 0.2968 - val_accuracy: 0.5160 - val_loss: 1.7675 Epoch 30/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9392 - loss: 0.2535 - val_accuracy: 0.5560 - val_loss: 1.8146 Epoch 31/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9239 - loss: 0.2727 - val_accuracy: 0.5240 - val_loss: 1.7801 Epoch 32/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9418 - loss: 0.2583 - val_accuracy: 0.5600 - val_loss: 1.7930 Epoch 33/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9562 - loss: 0.1966 - val_accuracy: 0.5480 - val_loss: 1.8136 Epoch 34/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - accuracy: 0.9549 - loss: 0.1962 - val_accuracy: 0.5360 - val_loss: 1.9375 Epoch 35/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9250 - loss: 0.2676 - val_accuracy: 0.5240 - val_loss: 1.9816 Epoch 36/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9317 - loss: 0.2384 - val_accuracy: 0.5520 - val_loss: 2.0031 Epoch 37/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9644 - loss: 0.1447 - val_accuracy: 0.5000 - val_loss: 2.1297 Epoch 38/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9639 - loss: 0.1728 - val_accuracy: 0.5600 - val_loss: 1.9049 Epoch 39/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9747 - loss: 0.1422 - val_accuracy: 0.5280 - val_loss: 1.9816 Epoch 40/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9832 - loss: 0.1293 - val_accuracy: 0.5480 - val_loss: 2.0905 Epoch 41/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9685 - loss: 0.1435 - val_accuracy: 0.5560 - val_loss: 2.0139 Epoch 42/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9752 - loss: 0.1304 - val_accuracy: 0.5400 - val_loss: 2.0129 Epoch 43/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9786 - loss: 0.1223 - val_accuracy: 0.5400 - val_loss: 2.0416 Epoch 44/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9773 - loss: 0.1224 - val_accuracy: 0.5240 - val_loss: 2.0511 Epoch 45/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9820 - loss: 0.1078 - val_accuracy: 0.5120 - val_loss: 2.1525 Epoch 46/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9852 - loss: 0.1061 - val_accuracy: 0.5160 - val_loss: 2.1780 Epoch 47/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9792 - loss: 0.1071 - val_accuracy: 0.5480 - val_loss: 2.1577 Epoch 48/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9851 - loss: 0.0919 - val_accuracy: 0.5440 - val_loss: 2.1966 Epoch 49/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9976 - loss: 0.0612 - val_accuracy: 0.5480 - val_loss: 2.2887 Epoch 50/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9837 - loss: 0.0945 - val_accuracy: 0.5040 - val_loss: 2.3680 Epoch 51/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9740 - loss: 0.1217 - val_accuracy: 0.5360 - val_loss: 2.3032 Epoch 52/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9852 - loss: 0.0860 - val_accuracy: 0.5320 - val_loss: 2.2783 Epoch 53/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9926 - loss: 0.0639 - val_accuracy: 0.5400 - val_loss: 2.2410 Epoch 54/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9918 - loss: 0.0550 - val_accuracy: 0.5320 - val_loss: 2.2745 Epoch 55/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9946 - loss: 0.0524 - val_accuracy: 0.5400 - val_loss: 2.3175 Epoch 56/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9961 - loss: 0.0414 - val_accuracy: 0.5400 - val_loss: 2.3332 Epoch 57/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9958 - loss: 0.0412 - val_accuracy: 0.5440 - val_loss: 2.3241 Epoch 58/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9989 - loss: 0.0356 - val_accuracy: 0.5320 - val_loss: 2.3560 Epoch 59/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9978 - loss: 0.0358 - val_accuracy: 0.5360 - val_loss: 2.4160 Epoch 60/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9970 - loss: 0.0377 - val_accuracy: 0.5400 - val_loss: 2.3731 Epoch 61/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9982 - loss: 0.0337 - val_accuracy: 0.5320 - val_loss: 2.4049 Epoch 62/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9967 - loss: 0.0301 - val_accuracy: 0.5200 - val_loss: 2.4363 Epoch 63/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9963 - loss: 0.0294 - val_accuracy: 0.5240 - val_loss: 2.4332 Epoch 64/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9987 - loss: 0.0244 - val_accuracy: 0.5360 - val_loss: 2.4721 Epoch 65/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9966 - loss: 0.0384 - val_accuracy: 0.5360 - val_loss: 2.4973 Epoch 66/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9977 - loss: 0.0226 - val_accuracy: 0.5600 - val_loss: 2.5251 Epoch 67/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9986 - loss: 0.0261 - val_accuracy: 0.5320 - val_loss: 2.5316 Epoch 68/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9966 - loss: 0.0312 - val_accuracy: 0.5320 - val_loss: 2.5184 Epoch 69/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9904 - loss: 0.0449 - val_accuracy: 0.5640 - val_loss: 2.5873 Epoch 70/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9885 - loss: 0.0553 - val_accuracy: 0.5240 - val_loss: 2.5717 Epoch 71/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0251 - val_accuracy: 0.5480 - val_loss: 2.6514 Epoch 72/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9922 - loss: 0.0479 - val_accuracy: 0.5320 - val_loss: 2.7161 Epoch 73/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.9922 - loss: 0.0362 - val_accuracy: 0.5240 - val_loss: 2.6747 Epoch 74/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9955 - loss: 0.0376 - val_accuracy: 0.5240 - val_loss: 2.7733 Epoch 75/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9404 - loss: 0.2041 - val_accuracy: 0.5440 - val_loss: 2.7325 Epoch 76/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9493 - loss: 0.2134 - val_accuracy: 0.5080 - val_loss: 2.7039 Epoch 77/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9624 - loss: 0.1325 - val_accuracy: 0.5360 - val_loss: 2.5794 Epoch 78/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9691 - loss: 0.0975 - val_accuracy: 0.5440 - val_loss: 2.7746 Epoch 79/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9751 - loss: 0.0712 - val_accuracy: 0.5400 - val_loss: 2.7646 Epoch 80/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9895 - loss: 0.0567 - val_accuracy: 0.5320 - val_loss: 2.5701 Epoch 81/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9999 - loss: 0.0226 - val_accuracy: 0.5280 - val_loss: 2.6910 Epoch 82/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.9965 - loss: 0.0295 - val_accuracy: 0.5440 - val_loss: 2.6383 Epoch 83/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9965 - loss: 0.0197 - val_accuracy: 0.5600 - val_loss: 2.7159 Epoch 84/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - accuracy: 0.9988 - loss: 0.0120 - val_accuracy: 0.5440 - val_loss: 2.7271 Epoch 85/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0124 - val_accuracy: 0.5480 - val_loss: 2.7273 Epoch 86/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0102 - val_accuracy: 0.5400 - val_loss: 2.7168 Epoch 87/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9963 - loss: 0.0141 - val_accuracy: 0.5160 - val_loss: 2.7817 Epoch 88/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9978 - loss: 0.0171 - val_accuracy: 0.5520 - val_loss: 2.7674 Epoch 89/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9969 - loss: 0.0126 - val_accuracy: 0.5360 - val_loss: 2.7392 Epoch 90/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9964 - loss: 0.0161 - val_accuracy: 0.5280 - val_loss: 2.7826 Epoch 91/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9949 - loss: 0.0115 - val_accuracy: 0.5520 - val_loss: 2.7978 Epoch 92/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9997 - loss: 0.0078 - val_accuracy: 0.5280 - val_loss: 2.8056 Epoch 93/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9970 - loss: 0.0130 - val_accuracy: 0.5440 - val_loss: 2.8000 Epoch 94/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9993 - loss: 0.0071 - val_accuracy: 0.5360 - val_loss: 2.8448 Epoch 95/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9974 - loss: 0.0182 - val_accuracy: 0.5600 - val_loss: 2.8185 Epoch 96/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9990 - loss: 0.0081 - val_accuracy: 0.5520 - val_loss: 2.8214 Epoch 97/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9982 - loss: 0.0072 - val_accuracy: 0.5440 - val_loss: 2.8363 Epoch 98/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9979 - loss: 0.0085 - val_accuracy: 0.5360 - val_loss: 2.8516 Epoch 99/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9995 - loss: 0.0065 - val_accuracy: 0.5520 - val_loss: 2.8632 Epoch 100/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9994 - loss: 0.0075 - val_accuracy: 0.5440 - val_loss: 2.9018 CPU times: user 22.2 s, sys: 1.13 s, total: 23.3 s Wall time: 30.2 s
<keras.src.callbacks.history.History at 0x7cda2949b910>
Model Evaluation¶
# Make predictions on the test set
Y_pred = model1.predict(X_test)
Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize = (15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support
0 0.58 0.58 0.58 24
1 0.76 0.81 0.79 27
2 0.42 0.33 0.37 24
3 0.38 0.39 0.38 23
4 0.40 0.38 0.39 26
5 0.60 0.62 0.61 24
6 0.66 0.89 0.76 28
7 0.70 0.67 0.68 24
8 0.48 0.36 0.41 28
9 0.32 0.32 0.32 22
accuracy 0.54 250
macro avg 0.53 0.54 0.53 250
weighted avg 0.53 0.54 0.54 250
- Observations
- Confusion Matrix
- Best classified genres: Classical (22), Jazz (22), Metal (18), Hip-hop (15).
- High misclassification: Blues, Country, Disco, Hip-hop, Pop, Reggae.
- Overlap observed: Genres 4, 5, 8, and 9 frequently misclassified.
- Classification Report
- Overall accuracy: 57% (Moderate performance).
- Best performing genres: Classical (0.79 F1), Metal (0.82 F1), Jazz (0.76 F1).
- Weak genres: Reggae (0.42 F1), Hip-hop (0.42 F1), Disco (0.42 F1).
- Macro avg: 0.58, Weighted avg: 0.59 (Moderate class balance).
- Key Insights
- Distinctive genres (Classical, Jazz, Metal) perform well.
- Overlapping genres (Pop, Disco, Hip-hop, Reggae) cause misclassification.
- Improvements: More training data, feature engineering, CNN/LSTM models, data augmentation.
- Confusion Matrix
# # Create a Sequential Model
# model2 = Sequential([
# Input(shape=(40,)), # Input Layer
# Dense(256, activation='relu'),
# BatchNormalization(), # Normalize activations
# Dropout(0.3), # Regularization
# Dense(128, activation='relu'),
# BatchNormalization(),
# Dropout(0.3),
# Dense(64, activation='relu'),
# Dense(10, activation='softmax') # Output layer for 10 classes
# ])
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Input
# Create a Sequential Model
model2 = Sequential([
Input(shape=(40,)), # Input Layer
Dense(256, activation='relu'),
BatchNormalization(), # Normalize activations
Dropout(0.3), # Regularization
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu'),
Dense(10, activation='softmax') # Output layer for 10 classes
])
# Print Summary of the model
model2.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_10 (Dense) │ (None, 256) │ 10,496 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization │ (None, 256) │ 1,024 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_11 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_1 │ (None, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_1 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_12 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_13 (Dense) │ (None, 10) │ 650 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 53,834 (210.29 KB)
Trainable params: 53,066 (207.29 KB)
Non-trainable params: 768 (3.00 KB)
model2.compile(
loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
num_epochs = 100
batch_size = 32
history = model2.fit(
X_train, Y_train,
validation_data=(X_test, Y_test),
epochs=num_epochs,
batch_size=batch_size,
verbose=1
)
Epoch 1/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 3s 25ms/step - accuracy: 0.1614 - loss: 2.5925 - val_accuracy: 0.2240 - val_loss: 4.0188 Epoch 2/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3664 - loss: 1.8161 - val_accuracy: 0.2960 - val_loss: 3.1013 Epoch 3/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4089 - loss: 1.6690 - val_accuracy: 0.3520 - val_loss: 2.3665 Epoch 4/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.4676 - loss: 1.4953 - val_accuracy: 0.3920 - val_loss: 1.9178 Epoch 5/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4881 - loss: 1.5140 - val_accuracy: 0.4360 - val_loss: 1.5959 Epoch 6/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.5668 - loss: 1.3085 - val_accuracy: 0.4680 - val_loss: 1.4722 Epoch 7/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 34ms/step - accuracy: 0.5578 - loss: 1.2672 - val_accuracy: 0.5120 - val_loss: 1.3836 Epoch 8/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 19ms/step - accuracy: 0.5857 - loss: 1.1701 - val_accuracy: 0.5320 - val_loss: 1.3469 Epoch 9/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6076 - loss: 1.1591 - val_accuracy: 0.5440 - val_loss: 1.3474 Epoch 10/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.6334 - loss: 1.0489 - val_accuracy: 0.5600 - val_loss: 1.3353 Epoch 11/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6221 - loss: 1.0010 - val_accuracy: 0.5840 - val_loss: 1.2491 Epoch 12/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.6639 - loss: 0.9914 - val_accuracy: 0.5800 - val_loss: 1.2682 Epoch 13/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6158 - loss: 0.9932 - val_accuracy: 0.5800 - val_loss: 1.2800 Epoch 14/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7139 - loss: 0.8758 - val_accuracy: 0.5800 - val_loss: 1.2948 Epoch 15/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6930 - loss: 0.9128 - val_accuracy: 0.5800 - val_loss: 1.2961 Epoch 16/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6664 - loss: 0.9184 - val_accuracy: 0.5520 - val_loss: 1.3645 Epoch 17/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.6872 - loss: 0.8436 - val_accuracy: 0.5720 - val_loss: 1.3270 Epoch 18/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7086 - loss: 0.8433 - val_accuracy: 0.6000 - val_loss: 1.3041 Epoch 19/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.6948 - loss: 0.8479 - val_accuracy: 0.6000 - val_loss: 1.2850 Epoch 20/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7181 - loss: 0.8038 - val_accuracy: 0.5920 - val_loss: 1.3278 Epoch 21/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6989 - loss: 0.8127 - val_accuracy: 0.6120 - val_loss: 1.2606 Epoch 22/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7356 - loss: 0.7398 - val_accuracy: 0.6080 - val_loss: 1.2664 Epoch 23/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7410 - loss: 0.7351 - val_accuracy: 0.5800 - val_loss: 1.3309 Epoch 24/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7217 - loss: 0.8048 - val_accuracy: 0.5960 - val_loss: 1.2819 Epoch 25/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7509 - loss: 0.7457 - val_accuracy: 0.5920 - val_loss: 1.3517 Epoch 26/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7436 - loss: 0.6820 - val_accuracy: 0.5960 - val_loss: 1.3294 Epoch 27/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7671 - loss: 0.6754 - val_accuracy: 0.5600 - val_loss: 1.3472 Epoch 28/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7593 - loss: 0.6513 - val_accuracy: 0.6040 - val_loss: 1.3363 Epoch 29/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7648 - loss: 0.6582 - val_accuracy: 0.6280 - val_loss: 1.2918 Epoch 30/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7907 - loss: 0.6515 - val_accuracy: 0.6240 - val_loss: 1.3183 Epoch 31/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7754 - loss: 0.6526 - val_accuracy: 0.6160 - val_loss: 1.3323 Epoch 32/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7865 - loss: 0.5884 - val_accuracy: 0.6080 - val_loss: 1.3120 Epoch 33/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7692 - loss: 0.6057 - val_accuracy: 0.6120 - val_loss: 1.3141 Epoch 34/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7971 - loss: 0.5570 - val_accuracy: 0.6280 - val_loss: 1.2801 Epoch 35/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7934 - loss: 0.6168 - val_accuracy: 0.6480 - val_loss: 1.2784 Epoch 36/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8012 - loss: 0.5556 - val_accuracy: 0.6240 - val_loss: 1.3097 Epoch 37/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8218 - loss: 0.5205 - val_accuracy: 0.6040 - val_loss: 1.3306 Epoch 38/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7764 - loss: 0.6479 - val_accuracy: 0.6160 - val_loss: 1.3189 Epoch 39/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7940 - loss: 0.5536 - val_accuracy: 0.6160 - val_loss: 1.3918 Epoch 40/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8308 - loss: 0.4865 - val_accuracy: 0.6240 - val_loss: 1.3394 Epoch 41/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8504 - loss: 0.4926 - val_accuracy: 0.6200 - val_loss: 1.3789 Epoch 42/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8522 - loss: 0.4395 - val_accuracy: 0.6240 - val_loss: 1.3533 Epoch 43/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8349 - loss: 0.4627 - val_accuracy: 0.6320 - val_loss: 1.3493 Epoch 44/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.8405 - loss: 0.4351 - val_accuracy: 0.6200 - val_loss: 1.4002 Epoch 45/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8508 - loss: 0.4468 - val_accuracy: 0.6120 - val_loss: 1.4628 Epoch 46/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8310 - loss: 0.4709 - val_accuracy: 0.6240 - val_loss: 1.4210 Epoch 47/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.8481 - loss: 0.4493 - val_accuracy: 0.5960 - val_loss: 1.3980 Epoch 48/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8584 - loss: 0.4333 - val_accuracy: 0.6160 - val_loss: 1.4414 Epoch 49/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.8293 - loss: 0.4359 - val_accuracy: 0.6040 - val_loss: 1.3537 Epoch 50/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.8759 - loss: 0.4010 - val_accuracy: 0.5960 - val_loss: 1.4421 Epoch 51/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8614 - loss: 0.3898 - val_accuracy: 0.6160 - val_loss: 1.4000 Epoch 52/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8672 - loss: 0.3979 - val_accuracy: 0.6160 - val_loss: 1.4364 Epoch 53/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8691 - loss: 0.4203 - val_accuracy: 0.5920 - val_loss: 1.4784 Epoch 54/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8871 - loss: 0.3592 - val_accuracy: 0.6040 - val_loss: 1.4443 Epoch 55/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8495 - loss: 0.4124 - val_accuracy: 0.6160 - val_loss: 1.4901 Epoch 56/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8758 - loss: 0.3626 - val_accuracy: 0.6200 - val_loss: 1.4679 Epoch 57/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8611 - loss: 0.3785 - val_accuracy: 0.6240 - val_loss: 1.4573 Epoch 58/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8655 - loss: 0.3717 - val_accuracy: 0.6360 - val_loss: 1.4217 Epoch 59/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8577 - loss: 0.3781 - val_accuracy: 0.6320 - val_loss: 1.4463 Epoch 60/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8752 - loss: 0.3829 - val_accuracy: 0.6560 - val_loss: 1.4227 Epoch 61/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8634 - loss: 0.3975 - val_accuracy: 0.6440 - val_loss: 1.3981 Epoch 62/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8791 - loss: 0.3751 - val_accuracy: 0.6440 - val_loss: 1.4193 Epoch 63/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8520 - loss: 0.4202 - val_accuracy: 0.6240 - val_loss: 1.3619 Epoch 64/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8877 - loss: 0.3440 - val_accuracy: 0.6600 - val_loss: 1.4788 Epoch 65/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8700 - loss: 0.3814 - val_accuracy: 0.6120 - val_loss: 1.4434 Epoch 66/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8669 - loss: 0.3302 - val_accuracy: 0.6080 - val_loss: 1.5090 Epoch 67/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8739 - loss: 0.3579 - val_accuracy: 0.5960 - val_loss: 1.5128 Epoch 68/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8937 - loss: 0.3272 - val_accuracy: 0.6280 - val_loss: 1.5566 Epoch 69/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8669 - loss: 0.3397 - val_accuracy: 0.6320 - val_loss: 1.4911 Epoch 70/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8727 - loss: 0.3298 - val_accuracy: 0.6320 - val_loss: 1.5200 Epoch 71/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8729 - loss: 0.3065 - val_accuracy: 0.6320 - val_loss: 1.4923 Epoch 72/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8689 - loss: 0.3115 - val_accuracy: 0.6160 - val_loss: 1.5116 Epoch 73/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8708 - loss: 0.3342 - val_accuracy: 0.6120 - val_loss: 1.6131 Epoch 74/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8528 - loss: 0.3673 - val_accuracy: 0.6160 - val_loss: 1.5809 Epoch 75/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8910 - loss: 0.3101 - val_accuracy: 0.6120 - val_loss: 1.5044 Epoch 76/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8775 - loss: 0.3500 - val_accuracy: 0.6200 - val_loss: 1.4944 Epoch 77/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8823 - loss: 0.3157 - val_accuracy: 0.6040 - val_loss: 1.5729 Epoch 78/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8680 - loss: 0.3624 - val_accuracy: 0.5920 - val_loss: 1.5684 Epoch 79/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9012 - loss: 0.2642 - val_accuracy: 0.6280 - val_loss: 1.5203 Epoch 80/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9168 - loss: 0.2759 - val_accuracy: 0.6400 - val_loss: 1.4986 Epoch 81/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9055 - loss: 0.2722 - val_accuracy: 0.6480 - val_loss: 1.5127 Epoch 82/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8970 - loss: 0.2675 - val_accuracy: 0.6280 - val_loss: 1.5707 Epoch 83/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8880 - loss: 0.3103 - val_accuracy: 0.6080 - val_loss: 1.6644 Epoch 84/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9083 - loss: 0.2493 - val_accuracy: 0.6120 - val_loss: 1.6682 Epoch 85/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9095 - loss: 0.2769 - val_accuracy: 0.6280 - val_loss: 1.6314 Epoch 86/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8885 - loss: 0.3101 - val_accuracy: 0.6440 - val_loss: 1.6029 Epoch 87/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9176 - loss: 0.2489 - val_accuracy: 0.6240 - val_loss: 1.6952 Epoch 88/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9038 - loss: 0.2822 - val_accuracy: 0.5920 - val_loss: 1.7332 Epoch 89/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9088 - loss: 0.2844 - val_accuracy: 0.6280 - val_loss: 1.6102 Epoch 90/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9135 - loss: 0.2518 - val_accuracy: 0.6320 - val_loss: 1.6249 Epoch 91/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9037 - loss: 0.2654 - val_accuracy: 0.6000 - val_loss: 1.6024 Epoch 92/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.8816 - loss: 0.2891 - val_accuracy: 0.6080 - val_loss: 1.5582 Epoch 93/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9085 - loss: 0.2587 - val_accuracy: 0.6400 - val_loss: 1.5490 Epoch 94/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.9103 - loss: 0.2508 - val_accuracy: 0.6120 - val_loss: 1.7080 Epoch 95/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9238 - loss: 0.2499 - val_accuracy: 0.6400 - val_loss: 1.6923 Epoch 96/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8938 - loss: 0.2576 - val_accuracy: 0.6360 - val_loss: 1.6316 Epoch 97/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9206 - loss: 0.2327 - val_accuracy: 0.6400 - val_loss: 1.6381 Epoch 98/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9276 - loss: 0.2366 - val_accuracy: 0.6080 - val_loss: 1.6746 Epoch 99/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9175 - loss: 0.2217 - val_accuracy: 0.5880 - val_loss: 1.7912 Epoch 100/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9107 - loss: 0.2364 - val_accuracy: 0.5960 - val_loss: 1.7483
import matplotlib.pyplot as plt
# Plot training & validation accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# Plot loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
- Observations
- Overfitting Detected
- Training accuracy increases steadily and reaches ~90%, while validation accuracy stagnates around 60%.
- The widening gap suggests the model is memorizing training data but generalizing poorly.
- Validation Loss Divergence
- Training loss decreases consistently, indicating learning progress.
- Validation loss stops decreasing early and starts fluctuating (~epoch 20), confirming overfitting.
- Possible Fixes
- Reduce overfitting: Increase dropout, add L2 regularization, or apply data augmentation.
- Early stopping: Stop training around epoch 20-30 to avoid further overfitting.
- Try a different architecture: A CNN-based model may perform better on spectral features.
- Overfitting Detected
# Make predictions on the test set
Y_pred = model2.predict(X_test)
Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize = (15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support
0 0.71 0.62 0.67 24
1 0.78 0.78 0.78 27
2 0.65 0.46 0.54 24
3 0.38 0.52 0.44 23
4 0.43 0.62 0.51 26
5 0.67 0.67 0.67 24
6 0.79 0.79 0.79 28
7 0.72 0.75 0.73 24
8 0.53 0.36 0.43 28
9 0.40 0.36 0.38 22
accuracy 0.60 250
macro avg 0.60 0.59 0.59 250
weighted avg 0.61 0.60 0.60 250
Model Performance Comparison
| Class | Precision (Before) | Precision (After) | Recall (Before) | Recall (After) | F1-Score (Before) | F1-Score (After) | Support |
|---|---|---|---|---|---|---|---|
| 0 | 0.67 | 0.64 | 0.50 | 0.67 | 0.57 | 0.65 | 24 |
| 1 | 0.76 | 0.76 | 0.81 | 0.81 | 0.79 | 0.79 | 27 |
| 2 | 0.50 | 0.56 | 0.42 | 0.58 | 0.45 | 0.57 | 24 |
| 3 | 0.37 | 0.40 | 0.48 | 0.43 | 0.42 | 0.42 | 23 |
| 4 | 0.42 | 0.54 | 0.42 | 0.50 | 0.42 | 0.52 | 26 |
| 5 | 0.68 | 0.67 | 0.62 | 0.67 | 0.65 | 0.67 | 24 |
| 6 | 0.73 | 0.70 | 0.79 | 0.82 | 0.76 | 0.75 | 28 |
| 7 | 0.90 | 0.73 | 0.75 | 0.79 | 0.82 | 0.76 | 24 |
| 8 | 0.44 | 0.64 | 0.39 | 0.50 | 0.42 | 0.56 | 28 |
| 9 | 0.37 | 0.53 | 0.50 | 0.41 | 0.42 | 0.46 | 22 |
Summary
| Metric | Before | After |
|---|---|---|
| Accuracy | 0.57 | 0.62 |
| Macro Avg F1 | 0.57 | 0.61 |
| Weighted Avg F1 | 0.58 | 0.62 |
Observations:
- Overall accuracy improved from 57% to 62%.
- Most F1-scores increased, showing better genre classification.
- Genres 0, 2, 3, 4, 8, 9 had notable precision and recall improvements.
- Genre 7 had a slight drop in precision but maintained good recall.
- Regularization and normalization likely stabilized the model.
model3 = Sequential([
Input(shape=(40,)), # MFCC has 40 features
Dense(512, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(64, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(10, activation='softmax') # 10 classes
])
model3.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model3.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_14 (Dense) │ (None, 512) │ 20,992 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_2 │ (None, 512) │ 2,048 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_2 (Dropout) │ (None, 512) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_15 (Dense) │ (None, 256) │ 131,328 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_3 │ (None, 256) │ 1,024 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_3 (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_16 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_4 │ (None, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_4 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_17 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_5 │ (None, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_5 (Dropout) │ (None, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_18 (Dense) │ (None, 10) │ 650 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 197,962 (773.29 KB)
Trainable params: 196,042 (765.79 KB)
Non-trainable params: 1,920 (7.50 KB)
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
model3.fit(
X_train, Y_train,
validation_data=(X_test, Y_test),
epochs=150, batch_size=32,
callbacks=[early_stop],
verbose=1
)
Epoch 1/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 5s 26ms/step - accuracy: 0.1228 - loss: 2.9445 - val_accuracy: 0.1960 - val_loss: 5.3707 Epoch 2/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3126 - loss: 2.0838 - val_accuracy: 0.2840 - val_loss: 3.4806 Epoch 3/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3622 - loss: 1.8702 - val_accuracy: 0.3680 - val_loss: 2.9951 Epoch 4/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3830 - loss: 1.7712 - val_accuracy: 0.3760 - val_loss: 2.3111 Epoch 5/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.4205 - loss: 1.6564 - val_accuracy: 0.3960 - val_loss: 1.9312 Epoch 6/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.4482 - loss: 1.5810 - val_accuracy: 0.4640 - val_loss: 1.5538 Epoch 7/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4401 - loss: 1.5212 - val_accuracy: 0.5040 - val_loss: 1.4928 Epoch 8/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - accuracy: 0.4667 - loss: 1.4911 - val_accuracy: 0.5480 - val_loss: 1.3615 Epoch 9/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.5174 - loss: 1.3848 - val_accuracy: 0.5480 - val_loss: 1.3262 Epoch 10/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.5363 - loss: 1.3024 - val_accuracy: 0.5280 - val_loss: 1.3417 Epoch 11/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - accuracy: 0.5518 - loss: 1.3130 - val_accuracy: 0.5320 - val_loss: 1.3164 Epoch 12/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.5347 - loss: 1.2810 - val_accuracy: 0.5680 - val_loss: 1.2725 Epoch 13/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.6173 - loss: 1.1125 - val_accuracy: 0.5640 - val_loss: 1.2819 Epoch 14/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - accuracy: 0.5888 - loss: 1.2067 - val_accuracy: 0.5400 - val_loss: 1.3178 Epoch 15/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.5916 - loss: 1.1562 - val_accuracy: 0.5640 - val_loss: 1.2765 Epoch 16/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6538 - loss: 1.0487 - val_accuracy: 0.5720 - val_loss: 1.2659 Epoch 17/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6110 - loss: 1.1025 - val_accuracy: 0.5560 - val_loss: 1.2505 Epoch 18/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.6223 - loss: 1.1070 - val_accuracy: 0.5760 - val_loss: 1.2803 Epoch 19/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6518 - loss: 1.0476 - val_accuracy: 0.5480 - val_loss: 1.3092 Epoch 20/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6375 - loss: 1.0310 - val_accuracy: 0.5560 - val_loss: 1.3138 Epoch 21/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6734 - loss: 0.9847 - val_accuracy: 0.5480 - val_loss: 1.3523 Epoch 22/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6211 - loss: 1.0464 - val_accuracy: 0.5600 - val_loss: 1.3280 Epoch 23/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.6785 - loss: 0.9450 - val_accuracy: 0.5680 - val_loss: 1.2899 Epoch 24/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6449 - loss: 0.9460 - val_accuracy: 0.5480 - val_loss: 1.3180 Epoch 25/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6991 - loss: 0.9414 - val_accuracy: 0.5440 - val_loss: 1.3512 Epoch 26/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.7087 - loss: 0.8700 - val_accuracy: 0.5600 - val_loss: 1.3259 Epoch 27/150 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7406 - loss: 0.8079 - val_accuracy: 0.5880 - val_loss: 1.3140
<keras.src.callbacks.history.History at 0x7cda242909d0>
# Make predictions on the test set
Y_pred = model3.predict(X_test)
Y_pred = [np.argmax(i) for i in Y_pred]
WARNING:tensorflow:5 out of the last 17 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7cda2470ac00> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize = (15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support
0 0.71 0.42 0.53 24
1 0.85 0.85 0.85 27
2 0.42 0.42 0.42 24
3 0.31 0.65 0.42 23
4 0.45 0.38 0.42 26
5 0.75 0.50 0.60 24
6 0.81 0.75 0.78 28
7 0.76 0.79 0.78 24
8 0.45 0.46 0.46 28
9 0.32 0.27 0.29 22
accuracy 0.56 250
macro avg 0.58 0.55 0.55 250
weighted avg 0.59 0.56 0.56 250
| Class | Precision (Run 1) | Recall (Run 1) | F1-score (Run 1) | Precision (Run 2) | Recall (Run 2) | F1-score (Run 2) | Precision (Run 3) | Recall (Run 3) | F1-score (Run 3) |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.67 | 0.50 | 0.57 | 0.64 | 0.67 | 0.65 | 0.67 | 0.50 | 0.57 |
| 1 | 0.76 | 0.81 | 0.79 | 0.76 | 0.81 | 0.79 | 0.83 | 0.93 | 0.88 |
| 2 | 0.50 | 0.42 | 0.45 | 0.56 | 0.58 | 0.57 | 0.42 | 0.33 | 0.37 |
| 3 | 0.37 | 0.48 | 0.42 | 0.40 | 0.43 | 0.42 | 0.35 | 0.52 | 0.42 |
| 4 | 0.42 | 0.42 | 0.42 | 0.54 | 0.50 | 0.52 | 0.39 | 0.46 | 0.42 |
| 5 | 0.68 | 0.62 | 0.65 | 0.67 | 0.67 | 0.67 | 0.64 | 0.67 | 0.65 |
| 6 | 0.73 | 0.79 | 0.76 | 0.70 | 0.82 | 0.75 | 0.70 | 0.75 | 0.72 |
| 7 | 0.90 | 0.75 | 0.82 | 0.73 | 0.79 | 0.76 | 0.86 | 0.79 | 0.82 |
| 8 | 0.44 | 0.39 | 0.42 | 0.64 | 0.50 | 0.56 | 0.61 | 0.50 | 0.55 |
| 9 | 0.37 | 0.50 | 0.42 | 0.53 | 0.41 | 0.46 | 0.33 | 0.27 | 0.30 |
| Accuracy | 0.57 | 0.57 | 0.57 | 0.62 | 0.62 | 0.62 | 0.58 | 0.57 | 0.58 |
| Macro Avg | 0.58 | 0.57 | 0.57 | 0.62 | 0.62 | 0.61 | 0.58 | 0.57 | 0.58 |
| Weighted Avg | 0.59 | 0.57 | 0.58 | 0.62 | 0.62 | 0.62 | 0.59 | 0.58 | 0.58 |
Observations:
- Best overall accuracy: Run 2 (0.62).
- Highest precision: Run 3 for class 1 (0.83).
- Most improved recall: Run 3 for class 1 (0.93).
- Class 7 consistently high across runs.
- Class 3 and 9 struggle in all runs.
CNN¶
Reshape Data for CNN
# Reshape input data to fit CNN (expand dimensions)
X_train_cnn = X_train.reshape(X_train.shape[0], X_train.shape[1], 1, 1)
X_test_cnn = X_test.reshape(X_test.shape[0], X_test.shape[1], 1, 1)
Define CNN Model
# Import necessary layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout, BatchNormalization
# Create a Sequential CNN Model (1D for Audio)
model4 = Sequential([
# First Conv Layer
Conv1D(32, kernel_size=3, activation='relu', input_shape=(40, 1)),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
# Second Conv Layer
Conv1D(64, kernel_size=3, activation='relu'),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
# Third Conv Layer
Conv1D(128, kernel_size=3, activation='relu'),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
# Flatten & Dense Layers
Flatten(),
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(0.4),
# Output Layer
Dense(10, activation='softmax') # 10 genres
])
# Print Summary
model4.summary()
Model: "sequential_6"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ conv1d (Conv1D) │ (None, 38, 32) │ 128 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_21 │ (None, 38, 32) │ 128 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d (MaxPooling1D) │ (None, 19, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_21 (Dropout) │ (None, 19, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv1d_1 (Conv1D) │ (None, 17, 64) │ 6,208 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_22 │ (None, 17, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d_1 (MaxPooling1D) │ (None, 8, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_22 (Dropout) │ (None, 8, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv1d_2 (Conv1D) │ (None, 6, 128) │ 24,704 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_23 │ (None, 6, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d_2 (MaxPooling1D) │ (None, 3, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_23 (Dropout) │ (None, 3, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ flatten_3 (Flatten) │ (None, 384) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_25 (Dense) │ (None, 128) │ 49,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_24 │ (None, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_24 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_26 (Dense) │ (None, 10) │ 1,290 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 83,018 (324.29 KB)
Trainable params: 82,314 (321.54 KB)
Non-trainable params: 704 (2.75 KB)
Compile the Model
# Compile the model
model4.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Train the CNN Model
# Set Hyperparameters
num_epochs = 100
batch_size = 32
# Train Model
history = model4.fit(
X_train_cnn, Y_train,
validation_data=(X_test_cnn, Y_test),
epochs=num_epochs,
batch_size=batch_size,
verbose=1
)
Epoch 1/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 6s 32ms/step - accuracy: 0.1527 - loss: 3.1711 - val_accuracy: 0.2320 - val_loss: 2.0631 Epoch 2/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2434 - loss: 2.3932 - val_accuracy: 0.2640 - val_loss: 1.9939 Epoch 3/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3181 - loss: 2.2320 - val_accuracy: 0.2960 - val_loss: 1.8749 Epoch 4/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2766 - loss: 2.1360 - val_accuracy: 0.3840 - val_loss: 1.7817 Epoch 5/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3433 - loss: 2.0282 - val_accuracy: 0.3960 - val_loss: 1.6828 Epoch 6/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.3638 - loss: 1.9365 - val_accuracy: 0.4200 - val_loss: 1.6278 Epoch 7/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3853 - loss: 1.7895 - val_accuracy: 0.4760 - val_loss: 1.5540 Epoch 8/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4155 - loss: 1.7556 - val_accuracy: 0.4560 - val_loss: 1.5050 Epoch 9/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.3891 - loss: 1.7393 - val_accuracy: 0.5160 - val_loss: 1.4592 Epoch 10/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.3921 - loss: 1.7375 - val_accuracy: 0.4920 - val_loss: 1.4594 Epoch 11/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4178 - loss: 1.6658 - val_accuracy: 0.4720 - val_loss: 1.4474 Epoch 12/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4323 - loss: 1.5399 - val_accuracy: 0.4760 - val_loss: 1.4415 Epoch 13/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4399 - loss: 1.5494 - val_accuracy: 0.5160 - val_loss: 1.4006 Epoch 14/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4365 - loss: 1.6432 - val_accuracy: 0.4840 - val_loss: 1.4029 Epoch 15/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4640 - loss: 1.5454 - val_accuracy: 0.4760 - val_loss: 1.4068 Epoch 16/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3907 - loss: 1.6032 - val_accuracy: 0.4800 - val_loss: 1.3779 Epoch 17/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4534 - loss: 1.4843 - val_accuracy: 0.5040 - val_loss: 1.3506 Epoch 18/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5174 - loss: 1.3946 - val_accuracy: 0.4840 - val_loss: 1.3345 Epoch 19/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.4738 - loss: 1.4918 - val_accuracy: 0.5000 - val_loss: 1.3277 Epoch 20/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.4997 - loss: 1.4145 - val_accuracy: 0.5080 - val_loss: 1.3480 Epoch 21/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.4691 - loss: 1.4274 - val_accuracy: 0.5160 - val_loss: 1.3297 Epoch 22/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.4945 - loss: 1.4029 - val_accuracy: 0.5320 - val_loss: 1.3464 Epoch 23/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.4806 - loss: 1.4251 - val_accuracy: 0.5440 - val_loss: 1.3265 Epoch 24/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5281 - loss: 1.3386 - val_accuracy: 0.5080 - val_loss: 1.3122 Epoch 25/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.5156 - loss: 1.3522 - val_accuracy: 0.5440 - val_loss: 1.2963 Epoch 26/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5267 - loss: 1.3252 - val_accuracy: 0.5320 - val_loss: 1.3064 Epoch 27/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5816 - loss: 1.1886 - val_accuracy: 0.5200 - val_loss: 1.2934 Epoch 28/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5480 - loss: 1.3097 - val_accuracy: 0.5400 - val_loss: 1.2608 Epoch 29/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5599 - loss: 1.2541 - val_accuracy: 0.5600 - val_loss: 1.2612 Epoch 30/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5393 - loss: 1.2577 - val_accuracy: 0.5840 - val_loss: 1.2503 Epoch 31/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5461 - loss: 1.2134 - val_accuracy: 0.5640 - val_loss: 1.2399 Epoch 32/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5881 - loss: 1.1734 - val_accuracy: 0.5560 - val_loss: 1.2310 Epoch 33/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5384 - loss: 1.2615 - val_accuracy: 0.5720 - val_loss: 1.2449 Epoch 34/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5884 - loss: 1.1504 - val_accuracy: 0.5480 - val_loss: 1.2350 Epoch 35/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6157 - loss: 1.0851 - val_accuracy: 0.5400 - val_loss: 1.2448 Epoch 36/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5705 - loss: 1.1760 - val_accuracy: 0.5520 - val_loss: 1.2239 Epoch 37/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5924 - loss: 1.1912 - val_accuracy: 0.5720 - val_loss: 1.2082 Epoch 38/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5859 - loss: 1.1989 - val_accuracy: 0.5480 - val_loss: 1.2125 Epoch 39/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - accuracy: 0.6000 - loss: 1.1794 - val_accuracy: 0.5640 - val_loss: 1.2124 Epoch 40/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.6050 - loss: 1.1284 - val_accuracy: 0.5760 - val_loss: 1.2092 Epoch 41/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.6305 - loss: 1.0575 - val_accuracy: 0.5720 - val_loss: 1.2363 Epoch 42/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.5868 - loss: 1.1914 - val_accuracy: 0.5840 - val_loss: 1.2163 Epoch 43/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.6044 - loss: 1.1277 - val_accuracy: 0.5680 - val_loss: 1.2146 Epoch 44/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6478 - loss: 1.0258 - val_accuracy: 0.5640 - val_loss: 1.2095 Epoch 45/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6546 - loss: 1.0141 - val_accuracy: 0.5600 - val_loss: 1.1999 Epoch 46/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - accuracy: 0.6218 - loss: 1.0096 - val_accuracy: 0.5880 - val_loss: 1.1766 Epoch 47/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6084 - loss: 1.0482 - val_accuracy: 0.5880 - val_loss: 1.1559 Epoch 48/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6548 - loss: 1.0395 - val_accuracy: 0.5720 - val_loss: 1.1851 Epoch 49/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6247 - loss: 1.0283 - val_accuracy: 0.5800 - val_loss: 1.1841 Epoch 50/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6464 - loss: 1.0146 - val_accuracy: 0.5680 - val_loss: 1.1708 Epoch 51/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6199 - loss: 1.0025 - val_accuracy: 0.5840 - val_loss: 1.1751 Epoch 52/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6699 - loss: 0.9646 - val_accuracy: 0.5680 - val_loss: 1.1832 Epoch 53/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6821 - loss: 0.9045 - val_accuracy: 0.5720 - val_loss: 1.1738 Epoch 54/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6697 - loss: 0.9196 - val_accuracy: 0.5840 - val_loss: 1.1756 Epoch 55/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6458 - loss: 0.9916 - val_accuracy: 0.5760 - val_loss: 1.1929 Epoch 56/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6760 - loss: 0.9553 - val_accuracy: 0.5760 - val_loss: 1.1864 Epoch 57/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7321 - loss: 0.8147 - val_accuracy: 0.5640 - val_loss: 1.1866 Epoch 58/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6716 - loss: 0.8989 - val_accuracy: 0.5800 - val_loss: 1.1896 Epoch 59/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6845 - loss: 0.8915 - val_accuracy: 0.5640 - val_loss: 1.1839 Epoch 60/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7238 - loss: 0.7884 - val_accuracy: 0.5800 - val_loss: 1.1879 Epoch 61/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7078 - loss: 0.8767 - val_accuracy: 0.5840 - val_loss: 1.1954 Epoch 62/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6876 - loss: 0.8778 - val_accuracy: 0.5720 - val_loss: 1.1893 Epoch 63/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6764 - loss: 0.8922 - val_accuracy: 0.5720 - val_loss: 1.1855 Epoch 64/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - accuracy: 0.7098 - loss: 0.8857 - val_accuracy: 0.5800 - val_loss: 1.1821 Epoch 65/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.7013 - loss: 0.8659 - val_accuracy: 0.5640 - val_loss: 1.1665 Epoch 66/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7251 - loss: 0.7978 - val_accuracy: 0.5880 - val_loss: 1.1796 Epoch 67/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6739 - loss: 0.9158 - val_accuracy: 0.5880 - val_loss: 1.1667 Epoch 68/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7181 - loss: 0.8279 - val_accuracy: 0.5760 - val_loss: 1.1665 Epoch 69/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6994 - loss: 0.7929 - val_accuracy: 0.5800 - val_loss: 1.1673 Epoch 70/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6930 - loss: 0.8592 - val_accuracy: 0.5960 - val_loss: 1.1804 Epoch 71/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6974 - loss: 0.8465 - val_accuracy: 0.5680 - val_loss: 1.1800 Epoch 72/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7237 - loss: 0.7922 - val_accuracy: 0.5760 - val_loss: 1.1803 Epoch 73/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7191 - loss: 0.7505 - val_accuracy: 0.5760 - val_loss: 1.1839 Epoch 74/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7152 - loss: 0.7647 - val_accuracy: 0.5880 - val_loss: 1.1875 Epoch 75/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7281 - loss: 0.7336 - val_accuracy: 0.5760 - val_loss: 1.1960 Epoch 76/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7193 - loss: 0.7853 - val_accuracy: 0.5800 - val_loss: 1.1985 Epoch 77/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7062 - loss: 0.8224 - val_accuracy: 0.5880 - val_loss: 1.1981 Epoch 78/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7476 - loss: 0.7051 - val_accuracy: 0.6040 - val_loss: 1.1694 Epoch 79/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7490 - loss: 0.7548 - val_accuracy: 0.6040 - val_loss: 1.1989 Epoch 80/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7345 - loss: 0.7588 - val_accuracy: 0.5960 - val_loss: 1.1945 Epoch 81/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7210 - loss: 0.7485 - val_accuracy: 0.5880 - val_loss: 1.1921 Epoch 82/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7420 - loss: 0.7601 - val_accuracy: 0.6160 - val_loss: 1.1732 Epoch 83/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7814 - loss: 0.7226 - val_accuracy: 0.6160 - val_loss: 1.1707 Epoch 84/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7798 - loss: 0.6639 - val_accuracy: 0.6000 - val_loss: 1.1937 Epoch 85/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7409 - loss: 0.7301 - val_accuracy: 0.6160 - val_loss: 1.1650 Epoch 86/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7821 - loss: 0.6272 - val_accuracy: 0.5920 - val_loss: 1.1887 Epoch 87/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7793 - loss: 0.6144 - val_accuracy: 0.6000 - val_loss: 1.1742 Epoch 88/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.8048 - loss: 0.6411 - val_accuracy: 0.5960 - val_loss: 1.1944 Epoch 89/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7585 - loss: 0.6628 - val_accuracy: 0.6280 - val_loss: 1.1864 Epoch 90/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.7776 - loss: 0.6323 - val_accuracy: 0.6000 - val_loss: 1.2027 Epoch 91/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7749 - loss: 0.6442 - val_accuracy: 0.5840 - val_loss: 1.2392 Epoch 92/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 22ms/step - accuracy: 0.8024 - loss: 0.6844 - val_accuracy: 0.6120 - val_loss: 1.2414 Epoch 93/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7541 - loss: 0.6975 - val_accuracy: 0.5800 - val_loss: 1.2347 Epoch 94/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.8317 - loss: 0.5621 - val_accuracy: 0.6000 - val_loss: 1.2354 Epoch 95/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.8180 - loss: 0.5335 - val_accuracy: 0.5720 - val_loss: 1.2360 Epoch 96/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7829 - loss: 0.6508 - val_accuracy: 0.5960 - val_loss: 1.2347 Epoch 97/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7943 - loss: 0.6066 - val_accuracy: 0.6080 - val_loss: 1.2209 Epoch 98/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.8113 - loss: 0.5884 - val_accuracy: 0.6080 - val_loss: 1.2130 Epoch 99/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7712 - loss: 0.6637 - val_accuracy: 0.6160 - val_loss: 1.2241 Epoch 100/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.8038 - loss: 0.6003 - val_accuracy: 0.6000 - val_loss: 1.2220
# Make predictions on the test set
Y_pred = model4.predict(X_test)
Y_pred = [np.argmax(i) for i in Y_pred]
WARNING:tensorflow:5 out of the last 17 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7cda2415c040> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 29ms/step
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize = (15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support
0 0.86 0.50 0.63 24
1 0.77 0.85 0.81 27
2 0.39 0.46 0.42 24
3 0.32 0.26 0.29 23
4 0.55 0.69 0.61 26
5 0.56 0.62 0.59 24
6 0.85 0.82 0.84 28
7 0.72 0.88 0.79 24
8 0.57 0.46 0.51 28
9 0.40 0.36 0.38 22
accuracy 0.60 250
macro avg 0.60 0.59 0.59 250
weighted avg 0.61 0.60 0.59 250
Model Performance Comparison
| Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| Model 1 (ANN) | 0.58 | 0.57 | 0.57 | 0.57 |
| Model 2 (ANN) | 0.59 | 0.57 | 0.58 | 0.58 |
| Model 3 (ANN) | 0.62 | 0.62 | 0.61 | 0.62 |
| Model 4 (CNN) | 0.61 | 0.60 | 0.59 | 0.60 |
Model Ranking
- Model 3 (ANN, deeper network) – Best accuracy (0.62) but slightly worse F1-score than CNN.
- Model 4 (CNN, 1D Conv) – Comparable to Model 3 but has better recall and stability.
- Model 2 (ANN, batch norm & dropout) – Moderate performance.
- Model 1 (Basic ANN) – Lowest performance.
Observation:
- Model 3 (ANN) and Model 4 (CNN) perform best.
- CNN has better recall, meaning it generalizes better across classes.
- ANN (Model 3) has slightly better accuracy but may be overfitting slightly.
- If focusing on generalization, CNN is preferable.
Model 5¶
Reshape Data for CNN
X_train_cnn = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) # (samples, 40, 1)
X_test_cnn = X_test.reshape(X_test.shape[0], X_test.shape[1], 1) # (samples, 40, 1)
Define CNN Model
model5 = Sequential([
Conv1D(32, kernel_size=3, activation='relu', input_shape=(40, 1)), # Ensure input_shape is (40,1)
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
Conv1D(64, kernel_size=3, activation='relu'),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
Conv1D(128, kernel_size=3, activation='relu'),
BatchNormalization(),
MaxPooling1D(pool_size=2),
Dropout(0.3),
Flatten(),
Dense(128, activation='relu'),
BatchNormalization(),
Dropout(0.4),
Dense(10, activation='softmax') # 10 output classes
])
Compile the Model
# Compile the model
model5.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Train the CNN Model
# Set Hyperparameters
num_epochs = 100
batch_size = 32
# Train Model
history = model5.fit(
X_train_cnn, Y_train,
validation_data=(X_test_cnn, Y_test),
epochs=num_epochs,
batch_size=batch_size,
verbose=1
)
Epoch 1/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 11s 49ms/step - accuracy: 0.1319 - loss: 3.1830 - val_accuracy: 0.2400 - val_loss: 2.1917 Epoch 2/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2038 - loss: 2.6141 - val_accuracy: 0.3120 - val_loss: 1.9182 Epoch 3/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3074 - loss: 2.2631 - val_accuracy: 0.3640 - val_loss: 1.7750 Epoch 4/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2730 - loss: 2.1906 - val_accuracy: 0.3880 - val_loss: 1.6925 Epoch 5/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3243 - loss: 2.0105 - val_accuracy: 0.4520 - val_loss: 1.6353 Epoch 6/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3603 - loss: 1.9082 - val_accuracy: 0.4960 - val_loss: 1.5842 Epoch 7/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3690 - loss: 1.8834 - val_accuracy: 0.4760 - val_loss: 1.5562 Epoch 8/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.3750 - loss: 1.8462 - val_accuracy: 0.4880 - val_loss: 1.5227 Epoch 9/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3528 - loss: 1.7773 - val_accuracy: 0.4960 - val_loss: 1.4930 Epoch 10/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4402 - loss: 1.6508 - val_accuracy: 0.5160 - val_loss: 1.4523 Epoch 11/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.4070 - loss: 1.7770 - val_accuracy: 0.5000 - val_loss: 1.4617 Epoch 12/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4323 - loss: 1.6061 - val_accuracy: 0.5080 - val_loss: 1.4538 Epoch 13/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.4564 - loss: 1.5315 - val_accuracy: 0.5160 - val_loss: 1.4185 Epoch 14/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.4317 - loss: 1.6456 - val_accuracy: 0.5120 - val_loss: 1.3931 Epoch 15/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4750 - loss: 1.5263 - val_accuracy: 0.5280 - val_loss: 1.3915 Epoch 16/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4636 - loss: 1.5361 - val_accuracy: 0.5280 - val_loss: 1.3922 Epoch 17/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4888 - loss: 1.4892 - val_accuracy: 0.5280 - val_loss: 1.3658 Epoch 18/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5159 - loss: 1.4322 - val_accuracy: 0.5200 - val_loss: 1.3524 Epoch 19/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4780 - loss: 1.4530 - val_accuracy: 0.5200 - val_loss: 1.3561 Epoch 20/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4846 - loss: 1.5005 - val_accuracy: 0.5240 - val_loss: 1.3658 Epoch 21/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5087 - loss: 1.4123 - val_accuracy: 0.5440 - val_loss: 1.3265 Epoch 22/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5400 - loss: 1.2802 - val_accuracy: 0.5280 - val_loss: 1.3115 Epoch 23/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.5479 - loss: 1.3316 - val_accuracy: 0.5560 - val_loss: 1.3159 Epoch 24/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.5662 - loss: 1.2315 - val_accuracy: 0.5440 - val_loss: 1.3033 Epoch 25/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 32ms/step - accuracy: 0.5332 - loss: 1.2953 - val_accuracy: 0.5480 - val_loss: 1.3162 Epoch 26/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.5306 - loss: 1.3366 - val_accuracy: 0.5480 - val_loss: 1.2996 Epoch 27/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.5311 - loss: 1.2940 - val_accuracy: 0.5600 - val_loss: 1.2965 Epoch 28/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5729 - loss: 1.2473 - val_accuracy: 0.5480 - val_loss: 1.2779 Epoch 29/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5358 - loss: 1.3840 - val_accuracy: 0.5520 - val_loss: 1.2810 Epoch 30/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5562 - loss: 1.2140 - val_accuracy: 0.5720 - val_loss: 1.2628 Epoch 31/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5854 - loss: 1.2260 - val_accuracy: 0.5720 - val_loss: 1.2538 Epoch 32/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5953 - loss: 1.1874 - val_accuracy: 0.5880 - val_loss: 1.2486 Epoch 33/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5837 - loss: 1.2231 - val_accuracy: 0.5920 - val_loss: 1.2231 Epoch 34/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5539 - loss: 1.2813 - val_accuracy: 0.5920 - val_loss: 1.2306 Epoch 35/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6072 - loss: 1.1923 - val_accuracy: 0.5680 - val_loss: 1.2561 Epoch 36/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5933 - loss: 1.1734 - val_accuracy: 0.5480 - val_loss: 1.2627 Epoch 37/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6410 - loss: 1.1140 - val_accuracy: 0.5560 - val_loss: 1.2375 Epoch 38/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5768 - loss: 1.1554 - val_accuracy: 0.5840 - val_loss: 1.2224 Epoch 39/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6302 - loss: 1.1210 - val_accuracy: 0.6000 - val_loss: 1.2261 Epoch 40/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5860 - loss: 1.1728 - val_accuracy: 0.6000 - val_loss: 1.2110 Epoch 41/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5659 - loss: 1.1502 - val_accuracy: 0.5840 - val_loss: 1.2236 Epoch 42/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6345 - loss: 1.0804 - val_accuracy: 0.5800 - val_loss: 1.2231 Epoch 43/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6009 - loss: 1.1118 - val_accuracy: 0.6000 - val_loss: 1.2147 Epoch 44/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5910 - loss: 1.1329 - val_accuracy: 0.5760 - val_loss: 1.2174 Epoch 45/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6243 - loss: 1.0147 - val_accuracy: 0.5680 - val_loss: 1.2180 Epoch 46/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6303 - loss: 1.0807 - val_accuracy: 0.5840 - val_loss: 1.2113 Epoch 47/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6230 - loss: 1.0242 - val_accuracy: 0.5600 - val_loss: 1.2085 Epoch 48/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6370 - loss: 1.0484 - val_accuracy: 0.5720 - val_loss: 1.2127 Epoch 49/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.6677 - loss: 1.0764 - val_accuracy: 0.5520 - val_loss: 1.2479 Epoch 50/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.6145 - loss: 1.0349 - val_accuracy: 0.5480 - val_loss: 1.2012 Epoch 51/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.6444 - loss: 1.0064 - val_accuracy: 0.5560 - val_loss: 1.2370 Epoch 52/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6343 - loss: 1.0336 - val_accuracy: 0.5480 - val_loss: 1.2321 Epoch 53/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6599 - loss: 0.9732 - val_accuracy: 0.5600 - val_loss: 1.2066 Epoch 54/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6516 - loss: 0.9959 - val_accuracy: 0.5640 - val_loss: 1.2004 Epoch 55/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6661 - loss: 0.9789 - val_accuracy: 0.5680 - val_loss: 1.2261 Epoch 56/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6645 - loss: 0.9656 - val_accuracy: 0.5800 - val_loss: 1.2136 Epoch 57/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6701 - loss: 0.9608 - val_accuracy: 0.5760 - val_loss: 1.1997 Epoch 58/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6829 - loss: 0.9780 - val_accuracy: 0.5680 - val_loss: 1.2072 Epoch 59/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6723 - loss: 0.8967 - val_accuracy: 0.5680 - val_loss: 1.2127 Epoch 60/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6510 - loss: 1.0032 - val_accuracy: 0.5960 - val_loss: 1.1992 Epoch 61/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6844 - loss: 0.8882 - val_accuracy: 0.5920 - val_loss: 1.2095 Epoch 62/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.6956 - loss: 0.8859 - val_accuracy: 0.6000 - val_loss: 1.1949 Epoch 63/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7227 - loss: 0.8781 - val_accuracy: 0.5880 - val_loss: 1.2037 Epoch 64/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6812 - loss: 0.8798 - val_accuracy: 0.5720 - val_loss: 1.2156 Epoch 65/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6757 - loss: 0.9462 - val_accuracy: 0.5880 - val_loss: 1.1921 Epoch 66/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6960 - loss: 0.8690 - val_accuracy: 0.5720 - val_loss: 1.2277 Epoch 67/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6853 - loss: 0.8997 - val_accuracy: 0.5640 - val_loss: 1.2350 Epoch 68/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6931 - loss: 0.8683 - val_accuracy: 0.5800 - val_loss: 1.2263 Epoch 69/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6570 - loss: 0.9462 - val_accuracy: 0.5800 - val_loss: 1.2125 Epoch 70/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7225 - loss: 0.8557 - val_accuracy: 0.5760 - val_loss: 1.1988 Epoch 71/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.7438 - loss: 0.7584 - val_accuracy: 0.5800 - val_loss: 1.1875 Epoch 72/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.7136 - loss: 0.7967 - val_accuracy: 0.5800 - val_loss: 1.2136 Epoch 73/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.7059 - loss: 0.7942 - val_accuracy: 0.5800 - val_loss: 1.2506 Epoch 74/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.6957 - loss: 0.8479 - val_accuracy: 0.5720 - val_loss: 1.2137 Epoch 75/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6820 - loss: 0.8297 - val_accuracy: 0.5720 - val_loss: 1.2268 Epoch 76/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7357 - loss: 0.7884 - val_accuracy: 0.5920 - val_loss: 1.2413 Epoch 77/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7303 - loss: 0.7509 - val_accuracy: 0.5840 - val_loss: 1.2104 Epoch 78/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7072 - loss: 0.8316 - val_accuracy: 0.5760 - val_loss: 1.2432 Epoch 79/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7232 - loss: 0.7848 - val_accuracy: 0.5680 - val_loss: 1.2636 Epoch 80/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7426 - loss: 0.7366 - val_accuracy: 0.5640 - val_loss: 1.2432 Epoch 81/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7413 - loss: 0.7741 - val_accuracy: 0.5840 - val_loss: 1.2028 Epoch 82/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7235 - loss: 0.7584 - val_accuracy: 0.5720 - val_loss: 1.2341 Epoch 83/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7424 - loss: 0.7615 - val_accuracy: 0.5520 - val_loss: 1.2321 Epoch 84/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7867 - loss: 0.6474 - val_accuracy: 0.5600 - val_loss: 1.2382 Epoch 85/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7469 - loss: 0.7123 - val_accuracy: 0.5840 - val_loss: 1.2176 Epoch 86/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7488 - loss: 0.7200 - val_accuracy: 0.5760 - val_loss: 1.2159 Epoch 87/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7307 - loss: 0.7601 - val_accuracy: 0.5680 - val_loss: 1.2168 Epoch 88/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7440 - loss: 0.6830 - val_accuracy: 0.5800 - val_loss: 1.2613 Epoch 89/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.7238 - loss: 0.7883 - val_accuracy: 0.5680 - val_loss: 1.2639 Epoch 90/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7548 - loss: 0.7278 - val_accuracy: 0.5800 - val_loss: 1.2360 Epoch 91/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7474 - loss: 0.7291 - val_accuracy: 0.5960 - val_loss: 1.2274 Epoch 92/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7510 - loss: 0.6723 - val_accuracy: 0.5960 - val_loss: 1.2114 Epoch 93/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7765 - loss: 0.6599 - val_accuracy: 0.6080 - val_loss: 1.2310 Epoch 94/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7682 - loss: 0.6546 - val_accuracy: 0.6040 - val_loss: 1.2476 Epoch 95/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7498 - loss: 0.7280 - val_accuracy: 0.5720 - val_loss: 1.2807 Epoch 96/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.7696 - loss: 0.6917 - val_accuracy: 0.5760 - val_loss: 1.2411 Epoch 97/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.7601 - loss: 0.6502 - val_accuracy: 0.5880 - val_loss: 1.2313 Epoch 98/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7440 - loss: 0.6833 - val_accuracy: 0.5720 - val_loss: 1.2437 Epoch 99/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 20ms/step - accuracy: 0.7922 - loss: 0.6392 - val_accuracy: 0.5840 - val_loss: 1.2336 Epoch 100/100 24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7924 - loss: 0.6201 - val_accuracy: 0.5840 - val_loss: 1.2365
Evaluate and Predict
# Make predictions on the test set
Y_pred = model5.predict(X_test_cnn)
Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step
# Set style as dark
sns.set_style("dark")
# Set figure size
plt.figure(figsize=(15, 8))
# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")
# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)
# Plot the confusion matrix as heatmap
sns.heatmap(cm, annot=True, cmap="BuPu", fmt='g', cbar=False)
# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")
# Show the plot
plt.show()
# Print the metrics
print(classification_report(Y_test, Y_pred))
precision recall f1-score support
0 0.79 0.46 0.58 24
1 0.91 0.74 0.82 27
2 0.50 0.54 0.52 24
3 0.31 0.39 0.35 23
4 0.60 0.58 0.59 26
5 0.48 0.50 0.49 24
6 0.80 0.86 0.83 28
7 0.66 0.88 0.75 24
8 0.58 0.50 0.54 28
9 0.30 0.32 0.31 22
accuracy 0.58 250
macro avg 0.59 0.58 0.58 250
weighted avg 0.60 0.58 0.59 250
Model Performance Comparison
| Model | Precision | Recall | F1-Score | Accuracy | Macro Avg | Weighted Avg |
|---|---|---|---|---|---|---|
| Model 1 | 0.58 | 0.57 | 0.57 | 0.57 | 0.58 | 0.59 |
| Model 2 | 0.62 | 0.62 | 0.61 | 0.62 | 0.62 | 0.62 |
| Model 3 | 0.59 | 0.57 | 0.57 | 0.57 | 0.58 | 0.59 |
| Model 4 (CNN) | 0.61 | 0.60 | 0.59 | 0.60 | 0.60 | 0.61 |
| Model 5 (Improved CNN) | 0.59 | 0.58 | 0.58 | 0.58 | 0.59 | 0.60 |
Model Rankings:
- Model 2 – Best overall performance.
- Model 4 (CNN) – Slightly lower but better than ANN models.
- Model 5 – Performance close to Model 4.
- Model 1 & Model 3 – Lower accuracy than CNN models.
Conclusion & Recommendations¶
Model Ranking
- Model 2 (Best ANN) – Highest accuracy (0.62), best macro/weighted avg, and most stable class-wise performance.
- Model 4 (Best CNN) – Similar accuracy (0.61), stronger feature extraction but slightly lower recall than Model 2.
- Model 5 (Improved CNN) – Good feature learning (0.60 accuracy) but slightly weaker than Model 4.
- Model 3 (Enhanced ANN) – Better than Model 1 but outperformed by CNN models.
- Model 1 (Baseline ANN) – Lowest accuracy (0.57), struggles with genre differentiation.
CNN models show promise but require optimization to outperform ANN models consistently.
Issues & Fixes
- Class Imbalance: Some genres (e.g., classical, metal, pop) perform well, while others (e.g., country, reggae) have lower recall. Consider data augmentation or class-weight balancing.
- Low Performance for Certain Genres: Country (class 3) and Reggae (class 9) are consistently misclassified. More MFCCs or spectral contrast features could improve performance.
- Overfitting Risk: CNN models show a tendency to overfit. Adjust dropout rates, batch normalization, and data augmentation to improve generalization.
- Epoch Reset: When training multiple models sequentially, epochs should be reset to prevent unintended training continuation.
- CNN vs ANN Trade-Off: CNNs extract features better but need deeper architectures or filter tuning to surpass Model 2.
Final Takeaway
- Model 2 remains the best overall baseline.
- CNN models (Model 4 & 5) show potential but require further tuning.
- Address genre misclassification with better MFCC features and augmentation.
- Optimize CNNs with deeper layers, filter tuning, and learning rate adjustments.
- Experiment with hybrid architectures (CNN+LSTM) for sequential feature learning.
# google
path_ipynb = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.ipynb'
notebook_path = path_ipynb
!jupyter nbconvert --to html "{notebook_path}"
from google.colab import files
path_html = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.html'
files.download(path_html)
[NbConvertApp] Converting notebook /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 13 image(s). [NbConvertApp] Writing 9063356 bytes to /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.html